Class Name: Example Class
Exercise Name: Introduction to R
Student Name: John Wick
Student Version: 1
Number of questions: 91
Date and time of compilation: 2019-12-02 14:00:48 at Dell-Desktop
Analyzing Financial and Economic Data with R .
afedR
. So, first install the package with command devtools::install_github('msperlin/afedR")
and, as an example for reading a table from file "example.rds", simply type:
df <- readRDS(afedR::afedR_get_data_file('example.rds'))
What is the name of the function that shows the representation of an object in R’s prompt ?
From the list of file extensions presented below, what is the most likely file extension to be used with function source()?
In a particular part of our code we used function load. What is extension of the file used as input in this function?
Consider the following code.
my.idx <- -5:87
x <- my.idx[3]
y <- my.idx[length(my.idx)-2]
Without the execution of the code, what is the content of objects x and y? ?
The following code was executed in R.
my.vec <- runif(100)
Which of the following commands will result in an error?
Consider the execution of the following R code:
rm(list = ls())
x=1:100
y=2:100
my.objs <- ls()
Which of the following commands will replicate the contents of my.objs?
Consider the following matrix M:
## [,1] [,2] [,3] [,4]
## [1,] 0.05877592 0.44089085 0.03588032 0.92811567
## [2,] 0.32488110 0.72529507 0.60857311 0.86124107
## [3,] 0.38521361 0.31460310 0.76853703 0.45059843
## [4,] 0.11466091 0.14062238 0.57652902 0.72667310
## [5,] 0.06326620 0.25311898 0.28326427 0.35534147
## [6,] 0.63639758 0.18581595 0.87131400 0.98908937
## [7,] 0.59556407 0.04686155 0.63940459 0.96646293
## [8,] 0.85630432 0.90281913 0.20985255 0.05959985
## [9,] 0.43056251 0.34906547 0.84502167 0.17460772
## [10,] 0.68934234 0.16849662 0.15081856 0.14233580
If we created a new object called sol.q with the code:
my.nrows <- nrow(M)
my.ncols <- ncol(M)
sol.q <- M[my.ncols-1,my.ncols]
What would be its contents ? (You should be able to find the solution without any coding).
Consider the creation of the following matrix M:
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.1201229 0.434687412 0.310296813 0.6390633 0.41032515
## [2,] 0.6505679 0.766424245 0.006963576 0.9698208 0.26285632
## [3,] 0.6896632 0.002263119 0.073638518 0.7803911 0.49426297
## [4,] 0.3699128 0.550947419 0.093738238 0.5812018 0.19406348
## [5,] 0.4542084 0.385360823 0.652605383 0.5900362 0.03681814
## [6,] 0.9265343 0.148605819 0.563453534 0.3018942 0.53461072
## [7,] 0.3975704 0.409763288 0.306174217 0.4631648 0.69532238
## [8,] 0.2759949 0.434012420 0.007718076 0.9012946 0.50714389
## [9,] 0.4225392 0.942006207 0.411977491 0.9014780 0.13591365
If we executed in R the commands dim(M)[1], ncol(M) and length(M) in that exact order, what would be the result?
If we executed the following code in R, what would be the result?
my.f <- list.files(path = 'data', pattern = '*.csv',full.names = TRUE)
A student executed the following R code:
set.seed(my.seed)
y <- sample(1:10, 3)
y <- y[c(1,3)]
x <- y[2]
If my.seed = 98, what is the content of object x?
If you are working in a directory of your computer called “C:/MyCode” and want to change the directory to a subfolder called “data”, which of the following commands will do that?
Read the data in file Brazil_footbal_games.csv
, which contains information about football games of the brazilian national team taken from Google.
After loading up the dataset, what is the sum of all values in column GolsBr?
Based on the dates of the games, what is the percentage of games happening in a saturday?
Based on the net result of goals, what is the percentage of games where the local team was victorious?
Calculate the sum of all goals of the local Brazilian team and the sum all goals against it. By taking the difference of the first from the later, what is the result?
With the imported table, create a new column with the absolute (signless) difference of goals for each game. Now, look for the game with the highest difference of absolute goals. Which country and date you find?
Read the data in file /home/msperlin/R/x86_64-pc-linux-gnu-library/3.6/afedR/extdata/data/Ibov_long_2010-01-01_2018-09-12.rds
and, based on it, answer the following question: what is the date with the largest historical return for the Ibovespa index?
What is the total return of the Ibovespa index during the period in the imported dataset?
Considering the data for the Ibovespa index, how many times did the sign of the return series repeated itself? That is, how many days you find where a previous positive (or negative) return is followed by another positive (or negative) return.
Import the data available in file SP500_comp_long_2014-10-17_2019-10-16.rds and filter the table so it only keeps rows for stock GOOGL. What is the largest price found for the resulting table?
Using the same data as previous exercise, filter the table for ticker MU. Find out the oldest and the more recent date available in the resulting table. What is the number of days between these dates?
For this exercise, filter the raw stock table for ticker EXR. Based on the resulting data, what is the date with the price of the stock most close to 90.5.
Using data for all stocks available in the raw table, how many stocks use the letter F in their ticker?
Given the following variables: $x=-21$ $y=-6$ $z=2$
Using R prompt to calculate $x*y+z$. What is the result?
Assuming that you bought 91 shares of a stock traded at 10 dollars. After some time, you sold 29 shares at 11 dollars and the remaining shares were sold for 10 dollars. Using an R script, structure this financial problem with the creation of numerical objects in R and, based on these, calculate the gross profit of the operation in the stock market. What is the result?
Use the following code to create vectors x and y in R:
[1] 3 1 1 1 1
[1] 3 1 1 2 1
What is the sum of the elements of a new vector resulted from the multiplication in between the elements of x and y ?
Create a vector x with the following formula where $i=1...336$: $x=\frac{ (-1)^{i+1}}{2i-1}$
What is the sum of the elements of x?
Using R and set.seed=78, create vectors x and y with the next chunk of code:
> set.seed(my.seed)
> x <- sample(-100:100, 250, replace=T)
> y <- sample(-100:100, 250, replace=T)
What is the number of positive elements in x?
What is the number of cases where x and y are positive simultaneously?
If we performed a cumulative sum in vector x, how many elements would have to be summed so that the result is higher than 88?
If you wanted to combine the text abc
with def
so that the result is abc--def
, which of the following command should be used?
paste('abc-,'-def', sep='-')
print('abc','def', sep='-')
paste('abc','def', sep='--')
cat('abc','def', sep='-')
paste0('abc','def')
Use function paste
and seq
to create the following atomic vector of characters of size392:
label 1, label 2, ... ,label 392
What is the sum of characters for all elements of the created vector?
Create the following object in R with set.seed=68:
> set.seed(my.seed)
> my.char <- paste(sample(letters,5000,replace = T), collapse = '')
how many times the letter "g" can be found in the resulting text object?
Based in object my.char created in previous question, if we divided it in several smaller pieces using letter"d", as the separator, what is the number of characters in the largest piece?
> set.seed(my.seed)
> my.char <- paste(sample(letters,5000,replace = T), collapse = '')
Create the following factor object in R with my.seed=100:
> set.seed(my.seed)
> my.factor <- factor(sample(1:1000,100))
After converting this object of type factor to numerical values, what is the sum of the elements of the resulting object?
What is the date 84 days after 2020-11-08?
Use R to create a sequence of dates between 2022-03-28 and 2023-08-10 every three days. What is the quantity of wednesdays in the resulting vector?
Whats the date and time located after 9806 seconds from 2019-12-02 14:00:17?
Using my.seed = 40, create a matrix with the following code:
set.seed(my.seed)
my.nrow <- 100
my.mat <- matrix(runif(my.nrow*10), nrow = my.nrow )
Which row of the matrix has the highest sum of elements? Tip: You can easily solve it using a loop or function apply
.
Using my.seed = 50, create a vector with the following code:
set.seed(my.seed)
N <- 10000
my.x <- runif(N)*sample(c(-0.5, 1), size = N, replace = TRUE)
Using a while
command, what is the first element where the cumulative sum of my.x
reaches a value higher than 24?
Import the data available at file FTSE.csv using function afedR::afedR_get_data_file
. There you will find dates, prices and traded volume for the FTSE index. Now, calculate the average trading volume for each weekday (monday, tuesday, ..). What is the week day with the lowest traded volume?
Using the same data from previous exercise, file FTSE.csv, test if the different of means from the traded volume of the day of the week with the lowest average value agasint the week day with the highest average value of trading volume. For that, use a simple T-test (function t.test
).
What is the p-value of the test?
Load the data from file Brazil_footbal_games.csv using `afedR::afedR_get_data_file. This file contains data taken from Google about games of the national football team.
Using loops or the dplyr
package, check which teams accumulated the highest number of victories over the national team? In case of a draw, use the alphabetical order of team names for the solution.
For the same data found within file /home/msperlin/R/x86_64-pc-linux-gnu-library/3.6/afedR/extdata/data/Brazil_footbal_games.csv, what is the score (BR X ADV) of the last game against the team with the most number of defeats to the Brazilian team? Once again, in case of draw, use the team by alphabetical order.
Using previous tables, files Brazil_footbal_games.csv) and Ibov_long_2010-01-01_2018-09-12.rds, create a table where the lines are related to each game and the columns are:
Question: what is the quantity of positive returns in the fourth column of the resulting table?
Tip: The simplest way to solve the problem is using loops. Iterate each game/row of the main table and do the required calculations.
Com base na tabela criada anteriormente, qual é a quantidade de retornos positivos do índice Ibovespa nas datas anteriores aos jogos da seleção?
Considering only the games where the brazilian team was victorious, what is the percentage of positive returns in the date after the games?
Open RStudio and load the data from file /home/msperlin/R/x86_64-pc-linux-gnu-library/3.6/afedR/extdata/data/SP500_comp_YEARLY_long_2014-10-03_2019-10-02.rds using afedR::afedR_get_data_file
. This file contains stock data for companies in the SP500 index in the yearly frequency. Using the programming tools of R, create a new table with three columns:
Which stocks have the lowest and highest standard deviation, respectively?
Note: You’ll have to deal with NA values and remove them within the originial data.
For the data in file SP500_comp_YEARLY_long_2014-10-03_2019-10-02.rds, calculate the total return (final price/first price - 1) for each stock. Which one has higher total return?
Load the data from file TDData_ALL_2019-10-02.rds
. This database contains price and yield data for different fixed income debt contract of the Brazilian government. Column ref.date indicates the reference date where the price/yield was registered at the end of the trading day and asset.code shows the name of the instrument. These debt contracts have a maturity date, meaning that they will eventually expire. Do notice that the expiration date is available within the name of the financial contract with format ddmmyy.
For your exercise, create a new dataframe with data exclusive for the following instruments:
Selected.debt.contracts |
---|
NTN-C 010705 |
LTN 011008 |
NTN-B 150509 |
NTN-F 010111 |
LTN 010120 |
NTN-B Principal 150519 |
NTN-B 150517 |
LTN 010112 |
NTN-B 150507 |
LTN 010116 |
After loading and filtering the data, what is the quantity of different dates?
Using the filtered data of 10 brazilian debt instruments from previous question, calculate the average price of each instrument. What is the sum of average of prices?
For the same previous data, what is the quantity of observations (rows) after date 2006-05-17?
For asset named NTN-B 150517, what is the price for the most recent available date?
Using the maturity date, what is the number of days between the oldeast and the most recent date?
Using the same maturity column, what is the absolute number of days between the date with least distance from 2017-12-02 and the other date with the highest distance from the same date?
For each asset, calculate a measure of price volatiliy by taking the difference between the lowest and the maximum price. What is the sum of these differences?
Create a new column in the dataframe, which should show the type of debt instrument (LTN, NTN-B, NTN-C, LFT, ..). Such information is available within the same of the financial contract.
Using the same filtered dataset, what is the type of asset with the most number of observations (rows) ?
Create a new column in the dataframe with the day of the week by using the date information in column ref.date
. After this operation, calculate the average price of each instrument according to the day of the week. What is the sum of average prices?
Using the information in column ref.date
, create a new column in the dataframe with the month of the year (january, february, ..). Now, calculate the average price of each instrument based on the month and day of the week. The resulting dataframe should have four columns: month, day of the week, asset.code, avgprices.
For the resulting dataframe, what is the sum of the column with average prices?
Consider a investor that bought 95 debt contracts of NTN-C 010705 in the first launch day of the financial contract. If the investor sold the contract in the last available date in the table, what would be its gross profit?
For the same investor of the previous questions, what is the percentage return in the operation, after the tax rate of 15 percent over the gross return?
Consider a investor that bought 98 contracts of each of the debt instruments in every launch day and held it until the last available date in the table. What is its gross profit in the operation?
This time, consider a investor that bought 55 contracts of each debt contract in the launch day. If the investor sold each one at its highest available price, what would be his gross profit?
Load the data in file /home/msperlin/R/x86_64-pc-linux-gnu-library/3.6/afedR/extdata/data/SP500-Stocks_long.csv
, which contains the trading price of different assets. For your exercise, consider only the following assets:
Selected.stocks |
---|
LVLT |
RIG |
YHOO |
FLIR |
CERN |
ABC |
MS |
EXPE |
FE |
UNM |
and exclude all others.
What is the number of rows in the resulting dataframe?
For the same table as the previous exercises, what is the number of observations in a wednesday?
For the same filtered dataframe from previous exercises, check which dates each stock presented its highest price. Among those, what is the most recent date?
With the same dataframe, add a new column with the arithmetic return ($R_t = P_t/P_{t-1} - 1$) of the stocks. In this new column, what is the lowest return for all stocks?
Consider the data in file Ibov_long_2010-01-01_2018-09-12.rds
. What is the sum of the $15$ largest daily return of the Ibovespa index?
Load the data from files IbovComp_long_2015-01-01_2019-11-10.rds
and Ibov_long_2010-01-01_2018-09-12.rds
. Now, using the Ibovespa returns (ticker = BVSP), add a new column in the stock data dataframe with the returns of the index. For each asset, calculate the excessive return of each stock for each date, that is, the difference of the stock return and the index return.
Based on the excessive return column, what is the sum of it when ignoring the NA values?
Consider the raw table from file IbovComp_long_2015-01-01_2019-11-10.rds
. Based on it, assume that a investor realized the following trade operations:
ticker | buy.date | sell.date | n.contracts |
---|---|---|---|
ABEV3 | 2015-05-08 | 2018-01-25 | 27 |
B3.SA | 2015-06-08 | 2019-07-16 | 11 |
BBDC4 | 2015-09-02 | 2017-08-21 | 20 |
BBSE3 | 2016-08-24 | 2018-09-17 | 45 |
BRAP4 | 2015-06-12 | 2019-09-03 | 19 |
BRFS3 | 2016-08-31 | 2017-10-24 | 36 |
BRKM5 | 2015-09-25 | 2019-08-02 | 27 |
BRML3 | 2016-09-26 | 2018-09-24 | 23 |
BTOW3 | 2017-03-06 | 2019-08-23 | 36 |
BVSP | 2016-06-16 | 2018-04-27 | 27 |
CIEL3 | 2015-11-27 | 2019-02-20 | 30 |
CMIG4 | 2016-12-02 | 2018-03-14 | 42 |
CSNA3 | 2015-07-08 | 2019-10-25 | 12 |
CVCB3 | 2016-01-15 | 2017-11-22 | 34 |
CYRE3 | 2015-04-15 | 2018-03-14 | 12 |
ECOR3 | 2016-03-23 | 2017-10-09 | 43 |
EGIE3 | 2016-08-10 | 2018-04-25 | 37 |
ELET6 | 2016-12-07 | 2017-10-06 | 46 |
EMBR3 | 2017-01-06 | 2019-05-09 | 20 |
ENBR3 | 2017-05-15 | 2017-06-29 | 33 |
FLRY3 | 2015-03-10 | 2017-09-08 | 28 |
GOAU4 | 2015-04-27 | 2019-04-22 | 41 |
GOLL4 | 2015-04-10 | 2018-03-21 | 10 |
HYPE3 | 2016-03-28 | 2018-10-30 | 49 |
I4.SA | 2015-11-13 | 2017-08-21 | 11 |
ITUB4 | 2017-02-01 | 2019-10-10 | 24 |
JBSS3 | 2016-05-16 | 2019-05-31 | 37 |
LAME4 | 2017-01-02 | 2019-03-13 | 24 |
LREN3 | 2015-03-26 | 2018-10-29 | 23 |
MRFG3 | 2015-11-18 | 2019-04-25 | 20 |
MRVE3 | 2015-09-21 | 2018-08-21 | 35 |
MULT3 | 2015-06-18 | 2019-05-30 | 11 |
N3.SA | 2017-03-31 | 2018-09-26 | 13 |
NATU3 | 2016-11-29 | 2017-08-23 | 23 |
PCAR4 | 2016-01-08 | 2017-10-16 | 27 |
PETR3 | 2015-07-27 | 2017-09-12 | 26 |
PETR4 | 2016-06-08 | 2018-05-02 | 24 |
QUAL3 | 2015-09-28 | 2019-01-15 | 48 |
RADL3 | 2015-07-13 | 2018-05-25 | 33 |
SANB11 | 2015-04-02 | 2019-04-26 | 33 |
SBSP3 | 2015-10-21 | 2018-12-07 | 24 |
SMLS3 | 2016-01-13 | 2017-11-06 | 16 |
SUZB3 | 2017-05-05 | 2018-01-18 | 45 |
TIMP3 | 2016-04-05 | 2019-03-19 | 30 |
UGPA3 | 2017-02-22 | 2019-05-21 | 48 |
USIM5 | 2016-09-05 | 2018-06-13 | 48 |
VALE3 | 2015-02-13 | 2017-12-05 | 47 |
VIVT4 | 2016-10-17 | 2017-12-15 | 31 |
VVAR3 | 2017-02-21 | 2018-10-08 | 43 |
WEGE3 | 2016-05-20 | 2018-01-09 | 11 |
Based on this information, what is the total gross profit of the investor?
Tip: load the trade table to excel or csv first, and later import it in the R code for the calculations.
Consider the financial data available in file SP500_comp_long_2014-10-17_2019-10-16.rds
and load it in R. Notice that the actual market index is included as ticker ^GSPC. After importing the dataset, filter out any row with NA values (you can use function complete.cases() or na.omit() for that). From the clean data, select the following stocks and exclude the rest from the dataset: RE, MCO, HP, CHD, T, ARNC, DAL, GM, ANSS, LKQ, CLX, PG, LB, ALL, CF, CPRI, ADSK, OXY, HII, JWN, BMY, KSU, MDLZ, INTU, JPM
Using arithmetic returns from the adjusted prices, what is the value of beta for stock GM? $R_t = \alpha + \beta R_{M,t} + \epsilon _t$
Using the previously filtered data, estimate the following model for stock ADSK.
$R_t = \alpha + (\beta _1 + \beta_2 D_t) R_{M,t} + \epsilon _t$
Where parameter $D_t$ is a dummy that takes value 1 if the weekday in time $t$ is monday and 0 otherwise. What is the value of $\beta_2$?
Using the previously filtered data, estimate the following model with lagged variables for stock JWN.
$R_t = \alpha + \beta _1 R_{M,t-1} + \epsilon _t$
What is the value of $\beta_1$?
Using the same data as previous exercises, estimate the following model for stock LB.
$R_t = \beta _1 R_{M,t} + \sum ^{5} _{i=1} \theta _i WeekDummy _{i,t} + \epsilon _t$
Where $WeekDummy _{i,t}$ is a dummy variable that takes value 1 if the day of the week is $i$ (monday to friday). What is the estimated value of parameter $\theta_3$?
Consider the estimation of the beta model for stock CLX. Using CRAN package car, do a linear hypothesis test for $\alpha = 0$ and $\beta = 0.4$. What is the p-value of the test?
Using the same model as before, do a Durbin Watson test of serial correlation. What is the p-value of the test?
Consider two stocks: OXY and CHD. Base on the return dataset, estimate the following linear model:
$R_{OXY,t} = \alpha + \beta _1 R_{CHD,t} + \epsilon _t$
What is the value of $\beta_1$?
After estimating a beta model for each of the assets in the dataset, use package lmtest to perform the Durbin Watson test for all linear models. How many of the 25 stocks reject the null hypothesis of the test at 5%?
Using the filtered dataframe for 25 stocks created earlier, calculate the beta coefficient (systematic risk) for all available stocks. What is the average value of beta across stocks?
Using the information from the previous exercise, what is the minimum value of alpha for all 25 stocks?
Consider the following GLM model:
$E \left( P(R _t > 0) \right) = g \left(\alpha + \beta _1 R_{Mkt,t} \right)$
where $P(R _t > 0)$ is the probability that the return of a particular stock is higher than 0 and g() is the probit function. Using the data for stock HP, what is the value of $\beta _1$?
Using set.seed = 82 and function arima.sim, simulate 333 observations of the following ARMA model: $y_t = 0.4 y_{t-1} + -0.09 \epsilon _{t-1} + \epsilon _{t}$
$\epsilon _{t} \sim N(0, 1.44)$
How many observations of the simulated model are higher than 0?
Given the simulated time series from the previous exercise, estimate an ARIMA(1,0,1) model. What is the value of the AR parameter?
Load the data in file SP500_comp_long_2014-10-17_2019-10-16.rds
and, based on the data for stock EMN, estimate an ARIMA(3, 0, 1) model for the returns of adjusted prices of the asset. What is the sum of the AR parameters?
Based on the previously estimated model for stock EMN, what is the forecasted return for t+1
Using function forecast::auto.arima to estimate the best model for the adjusted returns of stock BDX using the AIC criteria (see input ic in auto.arima). What is the optimal lag for the AR parameter?
Use the daily arithmetic return data for all 60 stocks available in file IbovComp_long_2015-01-01_2019-11-10.rds
(you need to create the return column), estimate the best arima model for each asset with function forecast::auto.arima using a maximum order (input max.order) of 5. How many stocks have AR lag order equal to 0?
Based on the arima models estimated in the previous exercise, perform, for all assets, a t+1 forecast for the conditional mean. Which stock has the highest value of forecasted return at time t+1?
Consider the following ARMA-GARCH process: $y _t = + 0.1 y_{t-1} + -0.05 \epsilon_{t-1} + \epsilon _t$
$\epsilon _t \sim N \left(0, h _t \right )$
$h _t = 1e-04 + 0.1 \epsilon ^2 _{t-1}+ 0.8 h_{t-1}$
Using set.seed = 16, simulate 642 observations of the model. What is the result for the last simulated value in the series?
Consider the following ARMA-GARCH process: $y _t = + 0.15 y_{t-1} + -0.15 \epsilon_{t-1} + \epsilon _t$
$\epsilon _t \sim N \left(0, h _t \right )$
$h _t = 1e-04 + 0.1 \epsilon ^2 _{t-1}+ 0.7 h_{t-1}$
Using set.seed = 77, do 1000 simulations of the process for 500 time periods. Looking at the simulated value at time 35 for all simulations, what is the maximum found value across all simulations?
Using package fGarch, estimate a ARMA(1,1)-GARCH(1,1) model for the returns of stock UNH, available in file SP500_comp_long_2014-10-17_2019-10-16.rds
. What is the value of the ARCH coefficient (alpha1)?