1

Say that I want to regress the predictive model: Return_t = x + Volume_t-1 + Volatility_t-1 + e. I have a 5-year weekly panel data with 28 companies already prepared in excel and looks like this:

ID  Date        Return      Volume       Volatility
1   2012-01-10  0.039441572 0.6979594    0.2606079
1   2012-01-17 -0.021107681 0.6447289    0.3741519
1   2012-01-24  0.004798082 1.0072677    0.3097104
1   2012-01-31  0.001559987 1.0066153    0.2761096
1   2012-02-07 -0.009058289 0.7218983    0.2592109
1   2012-02-14  0.046404936 1.2879986    0.4304542
2   2012-01-10  0.02073912 -0.141970906  0.2573633
2   2012-01-17 -0.00369127  0.007792180  0.3360240
2   2012-01-24 -0.05881038  0.001347634  0.2163933
2   2012-01-31 -0.05664598  0.640085029  0.3545598
2   2012-02-07  0.03654193  0.360513703  0.3594383
2   2012-02-14  0.03092432  0.105669775  0.3043643

I want to lag the independent variables setting it to t-1, which package allows me to do that in R? I am going to run a panel data regression with fixed effects.

h3rm4n
  • 4,126
  • 15
  • 21
Neri Kim
  • 135
  • 1
  • 2
  • 9

2 Answers2

3

After grouping by 'ID', we can use lag from dplyr

library(dplyr)
df1 %>% 
  group_by(ID) %>%
  mutate(Volume_1 = lag(Volume), Volatility_1 = lag(Volatility))

Or another option is shift from data.table

library(data.table)
nm1 <- c("Volume", "Volatility")
setDT(df1)[, paste0(nm1, "_1") := lapply(.SD, shift), by = ID, .SDcols = nm1]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I have problems with using dplyr. When I try to open it I get the error message: Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no package called ‘rlang’ In addition: Warning message: package ‘dplyr’ was built under R version 3.3.3 Error: package or namespace load failed for ‘dplyr’ – Neri Kim Jun 10 '17 at 10:22
  • @NeriKim The new version of R is 3.4.0. Can you update to new version and then install dplyr – akrun Jun 10 '17 at 10:23
  • Ok, thank you. I will look into it – Neri Kim Jun 10 '17 at 10:34
  • I updated R to the new version, now I get the error message: Error: package or namespace load failed for ‘dplyr’ in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]): there is no package called ‘Rcpp’ – Neri Kim Jun 10 '17 at 11:02
  • @NeriKim Have you tried installing `Rcpp` i.e. `install.packages("Rcpp")` – akrun Jun 10 '17 at 11:03
  • I'm using R studio and I have both versions of R; the old and the updated version. I guess I have to uninstall the old R version, so the packages are moved into the right folder because it says it wasnt able to move the package to the R 3.4 folder.. – Neri Kim Jun 10 '17 at 11:11
  • @NeriKim Are you using windows? – akrun Jun 10 '17 at 11:13
  • yes, I am using windows. – Neri Kim Jun 10 '17 at 11:14
  • @NeriKim That is strange, I am also using windows. I think when you install packages in the new R version, it should ask for creating a new folder, right – akrun Jun 10 '17 at 11:21
  • Maybe I should just uninstall everything and Install it over again.. – Neri Kim Jun 10 '17 at 11:28
  • @NeriKim Let me try to understand, when you are installing packages, is it going to install on system level or does it create a new folder `R` in "Documents" – akrun Jun 10 '17 at 11:31
  • when I tried to install Rcpp, the following message I get is this: – Neri Kim Jun 10 '17 at 11:34
  • Warning in install.packages : cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/src/contrib/PACKAGES.rds': HTTP status was '404 Not Found' Installing package into ‘C:/Users/neri_/OneDrive/Dokumenter/R/win-library/3.4’ (as ‘lib’ is unspecified) Warning in install.packages : cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/3.4/PACKAGES.rds': HTTP status was '404 Not Found' trying URL 'https://cran.rstudio.com/bin/windows/contrib/3.4/Rcpp_0.12.11.zip' Content type 'application/zip' length 3380287 bytes (3.2 MB) downloaded 3.2 MB – Neri Kim Jun 10 '17 at 11:34
  • package ‘Rcpp’ successfully unpacked and MD5 sums checked Warning in install.packages : unable to move temporary installation ‘C:\Users\neri_\OneDrive\Dokumenter\R\win-library\3.4\file4efc2836329c\Rcpp’ to ‘C:\Users\neri_\OneDrive\Dokumenter\R\win-library\3.4\Rcpp’ The downloaded binary packages are in C:\Users\neri_\AppData\Local\Temp\RtmpaOZSUs\downloaded_packages – Neri Kim Jun 10 '17 at 11:34
  • @NeriKim Is there any permission issues – akrun Jun 10 '17 at 11:36
  • I tried to uninstall and install the programs again. Now I can't even open other packages which I could before because of the 'Rcpp' error message. – Neri Kim Jun 10 '17 at 12:13
  • @NeriKim Can you try uninstalling the whole R and do the installation again. Try opening on R console and check – akrun Jun 10 '17 at 12:14
  • I see this is a common problem: https://stackoverflow.com/questions/30308639/r-cran-install-library-rcpp-fails-after-r3-2-upgrade – Neri Kim Jun 10 '17 at 12:15
  • 1
    It was because of the McAfee Antivirus, I deactivated and now it works fine! So sorry about this. – Neri Kim Jun 10 '17 at 12:20
  • So say that my df is called Paneldata, Paneldata %>% group_by(ID) %>% mutate(Volume_1 = lag(Volume), Volatility_1 = lag(Volatility)) Then I got this error message: Error in resolve_vars(new_groups, tbl_vars(.data)) : unknown variable to group by : ID. – Neri Kim Jun 10 '17 at 12:38
  • @NeriKim I am not sure about the error message. Have you loaded plyr package also. Try it on a fresh session with only `dplyr` loaded – akrun Jun 10 '17 at 12:47
  • It seems to be working now. How do I create a new output with the lagged variables? – Neri Kim Jun 10 '17 at 12:55
  • @NeriKim Are you saying that this `PaneldataN <- Paneldata %>% group_by(ID) %>% mutate(Volume_1 = lag(Volume), Volatility_1 = lag(Volatility))` not working – akrun Jun 10 '17 at 12:56
  • It is working now! I messed up a bit. Strange, the "lagged variables" are not lagged though.. – Neri Kim Jun 10 '17 at 13:02
  • @NeriKim What do you mean by not lagged though? Isn't `lag` not working? – akrun Jun 10 '17 at 13:05
  • No, Volume_1 and Volatility_1 are not lagged with t-1. Could it be that the df is in the wrong format? This df is imported directly from excel.. – Neri Kim Jun 10 '17 at 13:10
  • @NeriKim It could be. What is the `str(Paneldata)` ? – akrun Jun 10 '17 at 13:11
  • Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 7280 obs. of 6 variables: $ Company ID: num 1 1 1 1 1 1 1 1 1 1 ... $ Obs : num 1 2 3 4 5 6 7 8 9 10 ... $ Date : POSIXct, format: "2012-01-10" "2012-01-17" ... $ Return : num 0.03944 -0.02111 0.0048 0.00156 -0.00906 ... $ Volume : num 0.698 0.645 1.007 1.007 0.722 ... $ Volatility: num 0.261 0.374 0.31 0.276 0.259 ... – Neri Kim Jun 10 '17 at 13:13
  • @NeriKim It looks alright. Can you try `dplyr::lag(Volume), Volatility_1 = dplyr::lag(Volatility)` in the `mutate` in case `lag` got masked – akrun Jun 10 '17 at 13:14
  • 1
    Thank you so much! Now everything is working! – Neri Kim Jun 10 '17 at 13:24
  • @NeriKim Great, there is `lag` from `base R`, that could have masked it – akrun Jun 10 '17 at 13:25
  • Sorry, there is one final thing remaining. The lag should be within group, in this case within the Company ID. I apologize for not making this clear. Is that still possible in dplyr? – Neri Kim Jun 10 '17 at 13:43
  • To be more precise, the first observation within the groups (Company ID) should be NA, like it is in the first group – Neri Kim Jun 10 '17 at 13:45
  • @NeriKim yes, it is within group, we used `group_by(ID)` i.e. `df1 %>% group_by(ID) %>% mutate(Volume_1 = dplyr::lag(Volume), Volatility_1 = dplyr::lag(Volatility)) # A tibble: 12 x 7 # Groups: ID [2] ID Date Return Volume Volatility Volume_1 Volatility_1 1 1 2012-01-10 0.039441572 0.697959400 0.2606079 NA NA` – akrun Jun 10 '17 at 13:45
  • Hmm, but in the output it takes the last observation from the previous group instead of NA – Neri Kim Jun 10 '17 at 13:53
  • @NeriKim What is the version of your `dplyr`? I am getting NA – akrun Jun 10 '17 at 13:54
  • It is 0.5, should I update it to 0.7? – Neri Kim Jun 10 '17 at 13:55
  • @NeriKim yes, mine is 0.7 – akrun Jun 10 '17 at 13:57
  • Hmm, I trying to update it, but it wont.. :/ – Neri Kim Jun 10 '17 at 14:07
  • nevermind, newbie as I am, I opened dplyr befor plyr. I closed R and opened it again and opened the packages in the correct order. That was the problem, now it is fine. Sorry about this. Still new in R – Neri Kim Jun 10 '17 at 14:15
  • @NeriKim Yes, plyr::summarise or `plyr::mutate` might have messed it. If there are same functions from different packages, use `dplyr::mutate(Volume_1 = ...` – akrun Jun 10 '17 at 14:16
  • 1
    Anyway, I cannot thank enough for your help. You have saved my thesis! – Neri Kim Jun 10 '17 at 14:20
  • @NeriKim Glad to be part of your success – akrun Jun 10 '17 at 14:21
0

You can also use mutate_at and then join back:

df %>% 
  mutate_at(4:5, lag) %>% 
  left_join(df, ., by = c('ID','Date','Return'))

The output:

   ID       Date       Return     Volume.x Volatility.x     Volume.y Volatility.y
1   1 2012-01-10  0.039441572  0.697959400    0.2606079           NA           NA
2   1 2012-01-17 -0.021107681  0.644728900    0.3741519  0.697959400    0.2606079
3   1 2012-01-24  0.004798082  1.007267700    0.3097104  0.644728900    0.3741519
4   1 2012-01-31  0.001559987  1.006615300    0.2761096  1.007267700    0.3097104
5   1 2012-02-07 -0.009058289  0.721898300    0.2592109  1.006615300    0.2761096
6   1 2012-02-14  0.046404936  1.287998600    0.4304542  0.721898300    0.2592109
7   2 2012-01-10  0.020739120 -0.141970906    0.2573633  1.287998600    0.4304542
8   2 2012-01-17 -0.003691270  0.007792180    0.3360240 -0.141970906    0.2573633
9   2 2012-01-24 -0.058810380  0.001347634    0.2163933  0.007792180    0.3360240
10  2 2012-01-31 -0.056645980  0.640085029    0.3545598  0.001347634    0.2163933
11  2 2012-02-07  0.036541930  0.360513703    0.3594383  0.640085029    0.3545598
12  2 2012-02-14  0.030924320  0.105669775    0.3043643  0.360513703    0.3594383
h3rm4n
  • 4,126
  • 15
  • 21