1 Introduction

In the digital era, information is abundant and accessible. From the ever-changing price of financial contracts to the unstructured data of social media websites, the high volume of information creates a strong need for data analysis in the workplace. A company or organization benefit immensely when it can create a bridge between raw information from its environment and making strategic decisions. Undoubtedly, this is a prolific time for professionals skilled in using the right tools for acquiring, storing, and analyzing data.

In particular, datasets related to Economics and Finance are widely available to the public. International and local institutions, such as central banks, government research agencies, financial exchanges, and many others, provide their data publicly, either by legal obligation or to foment research. Whether you are looking into statistics for a particular country or a company, most information is just two clicks away. By analyzing this information efficiently and effortlessly, you’ll be able to offer valuable insights to your team.

Not surprisingly, fields with abundant access to data and practical applications, such as economics and finance, it is expected that a graduate student or a data analyst has learned at least one programming language that allows him/her to do his work efficiently. Learning how to program is becoming a requisite for the job market. In this setup, the role and contribution of R shine. In these sections, I will explain what R is and why you should use it.

1.1 What is R

R is a programming language specially designed to resolve statistical problems and display graphical representations of data. R is a modern version of S, a programming language originally created in Bell Laboratories (formerly AT&T, now Lucent Technologies). The base code of R was developed by two academics, Ross Ihaka and Robert Gentleman, resulting in the programming platform we have today. For anyone curious about the name, the letter R was chosen due to the common first letter of the name of their creators.

Today, R is almost synonymous with data analysis, with a large user base and consolidated modules. It is likely that researchers from various fields, from economics to biology, find in R significant preexisting code that facilitates their analysis.

On the business side, large and established companies, such as Google and Microsoft, already adopted R as the internal language for data analysis. R is maintained by the R Foundation7 and the R Consortium8, a collective effort to fund projects for extending the programming language.

1.2 Why Choose R

Learning a new programming language requires a lot of time and effort. Perhaps you’re wondering why you should choose R and invest time in learning it. Here are the main arguments.

First, R is a mature and stable platform, continuously supported and intensively used in the industry. When choosing R, you will have the computational background not only for an academic career in scientific research but also to work as a data analyst in private organizations. Due to its open license, you can use R anywhere. Also, the strong support from the community means it is very unlikely the R platform will ever fade away or be substituted. Depending on your career choices, R might be the only programming language you ever need to learn.

Learning R is easy. My experience in teaching R allows me to confidently state that students, even those with no programming experience, have no problem learning the language and using it to create their own code. The language is intuitive and certain rules and functions can be extended to different cases. For example, the function print is used to show the contents of an object on the screen. You can use it for any kind of object. So, by learning the main concept, you’ll be able to apply it in different scenarios. Once you understand how the software expects you to think, it is easy to discover new features starting from a previous logic. This generic notation facilitates the learning process.

The engine of R and the interface of RStudio creates a highly productive environment. The graphical interface provided by RStudio facilitates the use of R and increases productivity by introducing new features to the platform. By combining both, the user has at his disposal many tools that facilitate the development of research scripts and other projects.

CRAN Packages allow the user to do many different things with R. We will soon learn that we can import external code directly into R as individual modules (packages) and use it for different purposes. These packages extend the basic language of R and enable the most diverse functionalities. You can, for example, use R to write and publish a book, build and publish a blog, create exams with dynamic content, write random jokes and poems (seriously!), send emails, access and collect data from the internet, and many other features. It is truly impressive what you can do with just a couple of lines of code in R.

R is compatible with different operating systems and it can interface with different programming languages. If you need to execute code from another programming language, such as C++, Python, Julia, it is easy to integrate it with R. Therefore, the user is not restricted to a single programming language and can easily use features and functions from others. For example, the C++ code is well known for its superior speed in numerical tasks. From an R script, you can use package Rcpp (Eddelbuettel et al. 2021) to write a C++ function and effortlessly use it within your R code.

R is free! The main software and all its packages are free. A generous license motivates the adoption of the R language in a business environment, where obtaining individual and collective licenses of commercial software can be costly. This means you can take R anywhere you go.

1.3 What Can You Do With R and RStudio?

R is a fairly complete programming language and any computational problem can be solved based on it. Given the adoption of R for different areas of knowledge, the list is extensive. With finance and economics, I can highlight the following possibilities:

  • Substitute and improve data-intensive tasks from spreadsheet-like software;

  • Develop routines for managing investment portfolios and executing financial orders;

  • Creating tools for calculating and reporting economic indices such as inflation and unemployment;

  • Performing empirical data research using statistical techniques, such as econometric models and hypothesis testing;

  • Create dynamic websites with the shiny (Chang et al. 2021) package, allowing anyone in the world to use a computational tool created by you;

  • Automate the process of writing technical reports with the RMarkdown technology;

Moreover, public access to packages developed by users further expands these capabilities. The CRAN views website9 offers a Task Views panel for the topic of Finance10 and Econometrics11. There you can find the main packages to perform specific operations such as importing financial data from the internet, estimating econometric models, calculation of different risk estimates, among many other possibilities. Reading this page and the knowledge of these packages is essential for those who intend to work in Finance and Economics. It is worth noting, however, that the complete list of packages is much larger.

Be aware that R has a consistent release schedule. Every four months a new version of R is released, fixing bugs and implementing new solutions. There are two main types of releases, major and minor. For example, today, 2021-02-24, the latest version of R is 4.0.4. The first digit (“4”) indicates the major release while all others are of the minor type. Generally, the minor changes are very specific and, possibly, will have little impact on your work.

However, unlike minor releases, major releases are fully reflected in the R package ecosystem. Every time you install a new major version of R, you will have to reinstall all packages. Particularly, the problem here is that it is not uncommon that a new major release comes with package incompatibility issues. My advice is: every time a new major release of R comes out, wait a few months before installing it on your machine. Thus, the authors of the packages will have more time to update their codes, minimizing the possibility of compatibility problems.

1.4 Installing R

Before going any further, let’s install the required software on your computer. The most direct and practical way to install R is to direct your browser to R website12 and click the Download link in the left side of the page, as shown in Figure 1.1.

Initial page for downloading R

Figure 1.1: Initial page for downloading R

The next screen gives you a choice of the mirror to download the installation files. The CRAN repository (R Comprehensive Archive network) is mirrored in various parts of the world. You can choose one of the links from the nearest location to you. If undecided, just select the mirror 0-Cloud (see Figure 1.2), which will automatically take you to the nearest location.

Choosing the CRAN mirror

Figure 1.2: Choosing the CRAN mirror

The next step involves selecting your operating system, likely to be Windows. From now on, due to the greater popularity of this platform, we will focus on installing R in Windows. The instructions for installing R in other operating systems can be easily found online. Regardless of the underlying platform, using R is about the same. There are a few exceptions, especially when R interacts with the file system. In the content of the book, special care was taken to choose functions that work the same way in different operating systems. A few exceptions are highlighted throughout the book. So, even if you are using a Mac or a flavor of Linux, you can take full advantage of the material presented here.

Choosing the operating system

Figure 1.3: Choosing the operating system

After clicking the link Download R for Windows, as in Figure 1.3, the next screen will show the following download options: base, contrib, old.contrib and RTools. The first (base), should be selected. It contains the download link to the executable installation file of R in Windows.

If the user is interested in creating and distributing their own R packages, RTools should also be installed. For most users, however, this should not be the case. If you don’t intend to write packages, you can safely ignore Rtools for now. The links to contrib and old.contrib relate to files for the current and old releases of R packages and can also be ignored. We will discuss the use of packages in the next chapter.

Installation options

Figure 1.4: Installation options

After clicking the link base, the next screen will show the link to the download of the R installation file (Figure 1.5). After downloading the file, open it and follow the steps in the installation screen. At this time, no special configuration is required. I suggest keeping all the default choices and simply hit accept in the displayed dialogue screens. After the installation of R, it is strongly recommended to install RStudio, which will be addressed next.

Downloading R

Figure 1.5: Downloading R

1.5 Installing RStudio

The base installation of R includes its own GUI (graphical user interface), where we can write and execute code. However, this native interface has several limitations. RStudio substitutes the original GUI and makes access to R more practical and efficient. One way to understand this relationship is with an analogy with cars. While R is the engine of the programming language, RStudio is the body and instrument panel, which significantly improves the user experience. Besides presenting a more attractive look, RStudio also adds several features that make the life of a programmer easier, allowing the creation of projects and packages, creation of dynamic documents, among others.

The installation of RStudio is simpler than that of R. The files are available in RStudio website13. After accessing the page, click Download RStudio and then Download RStudio Desktop. After that, just select the installation file relative to the operating system on which you will work. This option is probably WINDOWS Vista 7/8/10. Note that, as well as R, RStudio is also available for alternative platforms.

I emphasize that using RStudio is not essential to develop programs in R. Other interfaces are available and can be used. However, in my experience, RStudio is the interface that offers a vast range of features for the language and is widely used, which justifies its choice.

1.6 Resources in the Web

The R community is vivid and engaging. There are many authors, such as myself14, that constantly release material about R in their blogs. It includes the announcement of new packages, posts about data analysis in real life, curiosities, rants, and tutorials. R-Bloggers15 is a website that aggregates these blogs, making it easier for anyone to access and participate. I strongly recommend to sign up for the R-Bloggers feed in RSS16, Facebook17 or Twitter18. Not only you’ll be informed of what is happening in the R community, but also learn a lot by reading other people’s code and articles.

Learning and using R can be a social experience. Several conferences and user-groups are available in many countries. You can find the complete list in this link19. I also suggest looking in social platforms for local R groups in your region.

1.7 Structure and Organization

This book presents a practical approach to using R in finance and economics. To get the most out of this book, I suggest you first seek to understand the code shown, and only then, try using it on your own computer. Whenever you find a piece of code that you do not understand, go on and study it. At first, it might seem like a daunting task but, with time, be confident that the learning process will get a lot easier as the code blocks start to make sense.

Learning to program in a new platform is like learning a foreign spoken language: the use in day-to-day problems is imperative to create fluency. All the code and data used in this book is available with the installation of package afedR (see the preface for instructions on how to install it). I suggest you test the code on your computer and play with it, modifying the examples and checking the effect of changes in the outputs. Whenever you have a computational problem, try using R to solve it. You’ll stumble and make mistakes at first. But I guarantee that, soon enough, you’ll be able to write complex data tasks effortlessly.

Throughout the book, every demonstration of code will have two parts: the R code and its output. The output is nothing more than the result of the commands on the screen. All inputs and outputs code will be marked in the text with a special format. See the following example:

# create a list
L <- list('abc', 1:5, 'dec')

# print list
print(L)
R> [[1]]
R> [1] "abc"
R> 
R> [[2]]
R> [1] 1 2 3 4 5
R> 
R> [[3]]
R> [1] "dec"

For the previous chunk of code, lines L <- list('abc', 1:5, 'dec') and print(L) are actual commands given to R. The output of this simple piece of code is the on-screen presentation of the contents of object L. The symbol R> is used for any code output. Notice also that inline comments are set with the symbol #. Anything on the right side of # is not evaluated by R. These comments serve as written notes about the code.

The code can also be spatially organized using newlines. This is a common strategy around arguments of functions. The next chunk of code is equivalent to the previous and will run the exact same way. Notice how we used a new line to vertically align the arguments of function list. You’ll soon see that, throughout the book, this type of vertical alignment is constantly used.

# create a list
L <- list('abc', 
          1:5, 
          'dec')

# print list
print(L)
R> [[1]]
R> [1] "abc"
R> 
R> [[2]]
R> [1] 1 2 3 4 5
R> 
R> [[3]]
R> [1] "dec"

The code also follows a well-defined structure. One decision in writing computer code is how to name objects and how to structure it. It is recommended to follow a clear pattern, so it is easy to maintain over time and be used and understood by others. For this book, a mixture of the author’s personal choices with the coding style suggested by Google20 was used. The reader, however, may choose the structure he finds more efficient and aesthetically pleasing. Like many things in life, this is a choice. We will get back at discussing code structure in chapter 13.

1.8 Exercises


Q.1

The R language was developed based on what other programming language?


Your Answer:

The solution is S. To reach the same result, you must execute the code below. For that, open a new R script in RStudio (Control+shift+N), copy and paste the code, and execute it whole by pressing Control+Shift+Enter or line by line with shortcut Control+Enter.
Straight from the book, section What is R: “R is a modern version of S, a programming language originally created in Bell Laboratories (formerly AT&T, now Lucent Technologies).”

Q.2

What are the names of the two authors of R?


Your Answer:

The solution is Ross Ihaka and Robert Gentleman. To reach the same result, you must execute the code below. For that, open a new R script in RStudio (Control+shift+N), copy and paste the code, and execute it whole by pressing Control+Shift+Enter or line by line with shortcut Control+Enter.

Straight from the book: “… The base code of R was developed by two academics, Ross Ihaka and Robert Gentleman, resulting in the programming platform we have today.”.


Q.3

Why is R special when comparing to other programming languages, such as Python, C++, javascript and others?


Your Answer:

The solution is It was designed for analyzing data and producing statistical output. To reach the same result, you must execute the code below. For that, open a new R script in RStudio (Control+shift+N), copy and paste the code, and execute it whole by pressing Control+Shift+Enter or line by line with shortcut Control+Enter.
Undoubtedly, the main differential of the R language is the ease with which data can be analyzed on the platform. Although other languages also allow data analysis, it is in R where this process is supported by a wide range of specialized packages.

Q.4

What was the reason the programming language was named R?


Your Answer:

The solution is Letter R is shared in the first names of its authors.. To reach the same result, you must execute the code below. For that, open a new R script in RStudio (Control+shift+N), copy and paste the code, and execute it whole by pressing Control+Shift+Enter or line by line with shortcut Control+Enter.
The letter R was chosen due to its use in the first letter of the two authors of the platform.

Q.5

Consider the following alternatives about R and RStudio:

I - R is a mature and stable programming platform;

II - RStudio is an modern interface to R, increasing productivity;

III - R is not compatible with different programming languages;

Which alternatives are correct?


Your Answer:

The solution is TRUE, TRUE, FALSE. To reach the same result, you must execute the code below. For that, open a new R script in RStudio (Control+shift+N), copy and paste the code, and execute it whole by pressing Control+Shift+Enter or line by line with shortcut Control+Enter.
See section “Why Choose R” in the “Introduction” chapter.

Q.6

Once you have R and RStudio installed, head over to the CRAN package website21 and look for technologies you use in your work. For example, if you use Google Sheets22 ostensibly in your work, you will soon discover that there is a package in CRAN that interacts with spreadsheets in the cloud.

  1. Browse CRAN package website
  2. Search for technologies you use in your work (Excel, Word, Google Docs, …)

Q.7

On the CRAN site you can also install the Rtools application. What is it for?


Your Answer:

The solution is Compile R packages locally. To reach the same result, you must execute the code below. For that, open a new R script in RStudio (Control+shift+N), copy and paste the code, and execute it whole by pressing Control+Shift+Enter or line by line with shortcut Control+Enter.

Rtools is an extension particular to R on Windows. It is used to compile packages from source code and is a requirement for those who develop packages. For the average user, however, it is also recommended to install Rtools as some packages require such compilation.

For Linux/Unix or MacOS users, Rtools is not necessary as, generally, compilers are already available by the operating system itself.


Q.8

Use Google to search for R groups in your region. Check if the meetings are frequent and, if you don’t have a major impediment, go to one of these meetings and make new friends.

It is not uncommon for programmers to have a tendency for introversion. This was certainly my case at the beginning of my career. But, know that shyness is a non-permanent state, a defense mechanism created by yourself against threats that don’t really exist! In the same way that you will improve in any sport at the rate of how often you practice it, the more comunicative you’ll become by simply speaking more.

The sad (or not) reality for the timid is that communication is a fundamental part of the adult life and is a way to maintain your professional network. The more people who know your work and your personality, the better. Perhaps a person you met in one of these groups can refer you to a job vacancy or future project. So, resuming, what do you really have to lose by going to one of these meetings?


Q.9

Go to the RBloggers website23 and look for a topic of interest to you, such as football (soccer) or investments (investments). Read at least three of the found blog posts.

I am particularly passionate about the sport of tennis. On the RBloggers website I’ve found the following articles mixing R and tennis:

Using R to study the evolution of Tennis

Visualizing Tennis Grand Slam Winners Performances

Tennis Grand Slam Tournaments Champions Basic Analysis


Q.10

If you work in an institution with data infrastructure, talk to the person in charge of the IT department and verify what technologies are used. Check if, through R, it is possible to access all tables in the databases. For now there is no need to write code, yet. Just check if this possibility exists.

At the university we have access to different paid repositories for financial data. Unfortunately, none of them offers any type of API for communicating with R. In fact, this was one of the motivators for writing R packages for free access to financial data.