R

New blog site: From Jekyll to Hugo

I while ago I wrote about purchasing my own webserver in digital ocean and hosting my shinny applications. Last week I finally got some time to migrate my blog from Github to my new domain, www.msperlin.com. While doing that, I also decided to change the technology behind making the blog, from Jekyll to Hugo. Here are my reasons. Jekyll is great for making simple static sites, specially with this template from Dean Attali.

Some Useful Tricks in RStudio

I’ve been using Rstudio for a long time and I got some tricks to share. These are simple and useful commands and shortcuts that really help the productivity of my students. If you got a suggestion of trick, use the comment section and I’ll add it in this post. Package rstudioapi When using Rstudio, package rstudioapi gives you lots of information about your session. The most useful one is the script location.

Loops and Pizzas

Loops in R First, if you are new to programming, you should know that loops are a way to tell the computer that you want to repeat some operation for a number of times. This is a very common task that can be found in many programming languages. For example, let’s say you invited five friends for dinner at your home and the whole cost of four pizzas will be split evenly.

New package in CRAN: PkgsFromFiles

Its been a while since I develop a CRAN package and this weekend I decided to work on a idea I had some time ago. The result is package PkgsFromFiles. When working with different computers at home or work, one of the problems I have is installing missing packages across different computers. As an example, a script that works in my work computer may not work in my home computer. This is specially annoying when I have a fresh install of the operating system or R.

Update to GetLattesData

Last year I released GetLattesData. This package is very handy for anyone that researches bibliometric data of Brazilian scholars. You could easily import the whole academic history of any researcher registered at the platform. More details about Lattes and GetLattesData in the this post. However, a couple months ago CNPQ introduced a captcha in the webpage. This made it impossible to download the xml files directly, breaking my code. It seems that those changes are now permanent.

BatchGetSymbols 2.2

One of the main requests I get for package BatchGetSymbols is to add the choice of frequency of the financial dataset. Today I finally got some time to work on it. I just posted a new version of BatchGetSymbols in CRAN. The major change is that users can now set the time frequency of the financial data: dailly, weekly, monthly or yearly. Let’s check it out: library(BatchGetSymbols) ## Loading required package: rvest ## Loading required package: xml2 ## Loading required package: dplyr ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union ## library(purrr) ## ## Attaching package: 'purrr' ## The following object is masked from 'package:rvest': ## ## pluck library(ggplot2) my.

Benchmarking a SSD drive in reading and writing files with R

I recently bought a new computer for home and it came with two drives, one HDD and other SSD. The later is used for the OS and the former stores all of my personal files. From all computers I had, both home and work, this is definitely the fastest. While some of the merits are due to the newer CPUS and RAM, the SSD drive can make all the difference in file operations.

Second Edition of "Processamento e Analise de Dados Financeiros e Econômicos com o R"

It is with great pleasure that I announce the second edition of the portuguese version of my book, Processing and Analyzing Financial Data with R. This edition updates the material significantly. The portuguese version is now not only in par with the international version of the book, but much more! Here are the main changes: The structure of chapters changed towards the stages of a research, from obtaining the raw data, cleaning it, manipulating it and, finally, reporting tables and figures.

Investing for the Long Run

I often get asked about how to invest in the stock market. Not surprisingly, this has been a common topic in my classes. Brazil is experiencing a big change in its financial scenario. Historically, fixed income instruments paid a large premium over the stock market and that is no longer the case. Interest rates are low, without the pressure from inflation. This means a more sustainable scenario for low-interest rates in the future.

Predatory Journals and R

My paper about the penetration of predatory journals in Brazil, Is predatory publishing a real threat? Evidence from a large database study, just got published in Scientometrics!. The working paper version is available in SSRN. This is a nice example of a data-intensive scientific work cycle, from gathering data to reporting results. Everything was done in R, using web scrapping algorithms, parallel processing, tidyverse packages and more. This was a special project for me, given its implications in science making in Brazil.

Writing papers about packages

Back in 2007 I wrote a Matlab package for estimating regime switching models. I was just starting to learn to code and this project was my way of doing it. After publishing it in FEX (Matlab file exchange site), I got so many repeated questions on my email that eventually I realized it would be easier to write a manual for people to read. Some time and effort would be spend writing it, but less time replying to repeated questions on my email.

Major update to BatchGetSymbols

I just released a long due update to package BatchGetSymbols. The files are under review in CRAN and you should get the update soon. Meanwhile, you can install the new version from Github: if (!require(devtools)) install.packages('devtools') devtools::install_github('msperlin/BatchGetSymbols') The main innovations are: Clever cache system: By default, every new download of data will be saved in a local file located in a directory chosen by user. Every new request of data is compared to the available local information.

Processamento e Análise de Dados Financeiros e Econômicos com o R

Este livro introduz o leitor ao uso do R e RStudio como plataforma de processamento e análise de dados financeiros e econômicos. O livro apresenta toda a base de conhecimento necessária para utilizar o R, desde a sua instalação até a criação de códigos de pesquisa. O livro está organizado com exemplos práticos de uso do código que contextualizam e facilitam o aprendizado em cada etapa do processo. Baseado no material do livro, o leitor aprenderá a baixar dados econômicos e financeiros de arquivos locais ou da internet, representar e processar esses dados utilizando a linguagem específica do R e, por fim, criar tabelas e figuras para reportar os resultados em um relatório técnico.

Looking back in 2017 and plans for 2018

My blog in 2017 As we come close to the end of 2017, its time to look back. This has been a great year for me in many ways. This blog started as a way to write short pieces about using R for finance and promote my book in an organic way. Today, I’m very happy with my decision. Discovering and trying new writing styles keeps my interest very much alive.

Serving shiny apps in the internet with your own server

In this post I’ll share my experience in setting up my own virtual server for hosting shiny applications in Digital Ocean. First, context. I’m working in a academic project where we build a package for accessing financial data and corporate events directly from B3, the Brazilian financial exchange. The objective is to set a reproducible standard and facilite data acquisition of a large, and very interesting, dataset. The result is GetDFPData.

Package GetDFPData

Financial statements of companies traded at B3 (formerly Bovespa), the Brazilian stock exchange, are available in its website. Accessing the data for a single company is straightforward. In the website one can find a simple interface for accessing this dataset. An example is given here. However, gathering and organizing the data for a large scale research, with many companies and many dates, is painful. Financial reports must be downloaded or copied individually and later aggregated.

The Brazilian Yield Curve

The latest version of GetTDData offers function get.yield.curve to download the current Brazilian yield curve directly from Anbima. The yield curve is a financial tool that, based on current prices of fixed income instruments, shows how the market perceives the future real, nominal and inflation returns. You can find more details regarding the use and definition of a yield curve in Investopedia. Unfortunately, function get.yield.curve only downloads the current yield curve from the website.

Looking forward to RFinance - Chicago

I’m looking forward to attending R in Finance conference in Chicago, next friday (2017-05-09). The program looks great! I am really happy, and a bit surprised, to see so many presentations related to market microstructure in the conference. I will talk about my package GetHFData in the first session. This is a package for downloading and aggregating trade data from Bovespa, the Brazilian exchange. More details are available in RBFIN.

Studying CRAN package names

Setting a name for a CRAN package is an intimate process. Out of an infinite range of possibilities, an idea comes for a package and you spend at least a couple of days writing up and testing your code before submitting to CRAN. Once you set the name of the package, you cannot change it. Your choice index your effort and, it shouldn’t be a surprise that the name of the package can improve its impact.

My Book about using R in Finance

I am very please to announce that my book,Processing and Analyzing Financial Data with R, is finally out! This book is an english version of my previous title in portuguese. This is a long term project that I plan to keep on working over the years. You can find it in Amazon. Following great titles about R, I decided to also publish an online version with full content here. More details about the book, including table of contents, is availabe in its webpage.

Can we predict stock prices with Prophet?

Facebook recently released a API package allowing access to its forecasting model called prophet. According to the underling post: It's not your traditional ARIMA-style time series model. It's closer in spirit to a Bayesian-influenced generalized additive model, a regression of smooth terms. The model is resistant to the effects of outliers, and supports data collected over an irregular time scale (ingliding presence of missing data) without the need for interpolation. The underlying calculation engine is Stan; the R and Python packages simply provide a convenient interface.

Writing a R book and self-publishing it in Amazon

Many people, including my university colleagues and friends, have asked me about the process of writing a book and self publishing it in Amazon. You can find the details about the book here and here. Given so much interest, I’m going to report the whole process in this post. First, motivation. Why did I write a book? I am a university professor. Writing is a major part of my work and I really enjoy it.

Using R to study tennis players

In the previous post about tennis, we studied how changes in ball’s composition in hard and grass courts affected the game back in 2000. In this post, we will analyse a different dataset from the same repository and look at the players winning records in ATP matches. The data I’m again using the great repository of tennis data of Jeff Sackmann. In this case, however, I’m using the ATP repository that contains ATP match data since 1968 until today.

Building and maintaining exams with dynamic content

Part of my job as a researcher and teacher is to periodically apply and grade exams in my classroom. Being constantly in the shoes of an examiner, you soon quickly realize that students are clever in finding ways to do well in an exam without effort. These days, photos and pdf versions of past exams and exercises are shared online in facebook, whatsapp groups, instagram and what not. As weird as it may sound, the distribution of information in the digital era creates a problem for examiners.

How to calculate betas (systematic risk) for a large number of stocks

One of the first examples about using linear regression models in finance is the calculation of betas, the so called market model. Coefficient beta is a measure of systematic risk and it is calculated by estimating a linear model where the dependent variable is the return vector of a stock and the explanatory variable is the return vector of a diversified local market index, such as SP500 (US), FTSE (UK), Ibovespa (Brazil), or any other.

Using R to download high frequency trade data diretcly from Bovespa

Recently, Bovespa, the Brazilian financial exchange company, allowed external access to its ftp site. In this address one can find several information regarding the Brazilian financial system, including datasets with high frequency (tick by tick) trading data for three different markets: equity, options and BMF. Downloading and processing these files, however, can be exausting. The dataset is composed of zip files with the whole trading data, separated by day and market.

Financial Data Science

Data science is ...

Processing and Analyzing Financial Data with R

Research Awards

2018 Article “Is predatory publishing a real threat? Evidence from a large database study” in the top 5% of all research outputs scored by Altmetric. [link] Top 10% of authors on SSRN by all-time downloads. [link] Top 10% of authors on SSRN by total new downloads within the last 12 months. [link] 2016 RBFIN best paper of 2015 (Honorary mention) - Award from the Brazilian Finance Society for best paper published in the Brazilian Review of Finance for the year of 2015.