0

I'm a beginner to R so please excuse me by advance if my question seems basic but I'm quite lost between all the functions that can be used to split a database (lubridate, split, daply...).

My data is composed by stock market index' daily returns from 1978 to 2017. I would like to regress Y1 on X1, X2, X3 and X4 every 6 months. For information, there are 262 observations each year. So 6 months = 131 trading days.

Here is an extract of my (excel) database:

Date y1 x1 x2 x3 x4
01/06/78 -0,054728735 0,062336581 -0,017447642 0,018066145 0,0137291
01/09/78 -0,0633203 0,051713026 -0,025691177 -0,006909645 -0,015750265
01/10/78 -0,048852901 0,026756766 -0,00910902 -0,013302491 -0,025715185
01/11/78 -0,049357647 0,013119868 -0,005255487 -0,008035708 -0,01565239
01/12/78 -0,044503679 -0,029061109 -0,016565941 -0,01131818 -0,008933417
01/13/78 -0,027863545 -0,044460617 -0,012819194 -0,021071992 -0,015533829
01/16/78 -0,026495125 -0,056336531 -0,007379243 -0,003360595 0,01056797
01/17/78 -0,007670981 -0,041300771 0 -0,00111657 0,019498044
01/18/78 0,000662032 -0,031227275 0,003506725 -0,018967432 -0,003861009

I think that the best way to do that is (i) to split my database every 6 months, then (ii) to make a linear regression on each 6 month segment.

Could you tell me if it is the best way, according to you? If it is the case could you show me a code to split a database every 6 month?

Thanks you a lot for your help!

tristanjou
  • 35
  • 8
  • can you add the output of dput in your question rather this raw data. It will be much easy to reproduce. See this for help https://stackoverflow.com/a/5963610/2179336 – Dhawal Kapil Aug 21 '17 at 12:47

1 Answers1

0

You could create a sequence of numbers cooresponding to the rows of your data:

i = c(seq(1,nrow(dat),by=131),nrow(dat))

And then use lapply to split the data:

lapply(seq_along(i)[-length(i)],function(x) dat[i[x]:(i[x+1]-1),])
count
  • 1,328
  • 9
  • 16
  • Thanks! The code is working. But I still have 2 questions. 1/ How can I call each of the subsets that have been created? For example, how can I work in the second or the third subset? 2/I want to do regression on each of my subsets. Maybe I should use lapply again or do you think that a loop is better suited? – tristanjou Aug 21 '17 at 14:22