0

For my bachelor project I conduct a performance analysis in R. Unfortunately I have no experience with loops and could not find any suitable solutions.

I got a dataframe with daily data which is divided into 3 different volumes/ sizes. For one stock I need about 2000 lines of code to analyze it on a weekly basis. In my dataframe there are about 99 shares in 3 different sizes, so I would have to copy/paste the ISIN number every time.

To get an impression here the code:

w01.2016_DE0005089031 <- df_DE0005089031 %>% 
  dplyr::select(Date, ClosingPrice ) %>%
  dplyr::filter(Date >= as.Date("2016-01-04") & Date <= as.Date("2016-01-10"))

w01.2016_DE0005089031 <- log((w01.2016_DE0005089031[ nrow(w01.2016_DE0005089031),2]) / w01.2016_DE0005089031[1,2])
.
.up to
.
w18.2020_DE0005089031 <- df_DE0005089031 %>% 
  dplyr::select(Date, ClosingPrice ) %>%
  dplyr::filter(Date >= as.Date("2020-04-27") & Date <= as.Date("2020-04-30"))

w18.2020_DE0005089031 <- log((w18.2020_DE0005089031[ nrow(w18.2020_DE0005089031),2]) / w18.2020_DE0005089031[1,2])

and

df01_w01_2016_SmallCap <- df %>%
  dplyr::select(Underlying, Date, Size, ClosingPrice, NumberOfBuyers, NumberOfSellers) %>%
  dplyr::filter(Underlying == "DE0005089031", Size == "SmallCap", Date >= as.Date("2016-01-04") & Date <= as.Date("2016-01-10"))
.
.up to
.
df01_w18_2020_SmallCap <- df %>%
  dplyr::select(Underlying, Date, Size, ClosingPrice, NumberOfBuyers, NumberOfSellers) %>%
  dplyr::filter(Underlying == "DE0005089031", Size == "SmallCap", Date >= as.Date("2019-04-27") & Date <= as.Date("2019-03-30"))

in addition there are various merges and analysis.

Is it possible to write a loop so that R does the same operation for the other 99 ISIN numbers (e.g. DE0005089031, DE000KGX8881, DE0006070006, etc...) without copy/paste and doing them all by hand?

I am thankful for any help. :)

Edit2: my df looks like this:

Underlying       Date ClosingPrice      Size NumberOfBuyers NumberOfSellers
DE0005089031 04.01.2016      49.5010  SmallCap              0               0                    
DE0005089031 04.01.2016      49.5010 MediumCap              0               0                    
DE0005089031 04.01.2016      49.5010  LargeCap              1               2                    
DE0005089031 05.01.2016      49.7855  SmallCap              0               0                    
DE0005089031 05.01.2016      49.7855 MediumCap              0               1                    
DE0005089031 05.01.2016      49.7855  LargeCap              0               0 
.
.up to
.
 Underlying       Date ClosingPrice      Size NumberOfBuyers NumberOfSellers 
NL0012169213 29.04.2020        38.60 MediumCap              0               0                    
NL0012169213 29.04.2020        38.60  LargeCap              0               0                    
NL0012169213 29.04.2020        38.60  SmallCap              0               1                    
NL0012169213 30.04.2020        37.79 MediumCap              0               0                    
NL0012169213 30.04.2020        37.79  LargeCap              1               0                    
NL0012169213 30.04.2020        37.79  SmallCap              1               1                                      

It has 412539 obs. of 7 variables with 99 shares

Qncy
  • 57
  • 6
  • 1
    welcome to SO! in general, yes it's possible. However, it looks like you hardcoded your ISIN number into the name of the data.frame, which is not good style (because you can't easily automate analysis over several ISINs, it's easier to have it in one data.frame). Please give an example how your data looks like and a more detailed description what you want to do. – starja Jun 01 '20 at 09:20
  • thank you! i edited the post with the df. – Qncy Jun 01 '20 at 09:51
  • You can do the operation per underlying using `group_by(underlying)` . – Ronak Shah Jun 01 '20 at 09:52
  • Please don't post data as images. Take a look at [how to make a great reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to show data. – Martin Gal Jun 01 '20 at 10:01
  • oh sorry. i edited. i will try it with a operation asap. thank you. ill keep you posted on how it went. – Qncy Jun 01 '20 at 10:16

1 Answers1

0

As @Ronak Shah pointed out, you should take a look at group_by which groups your data according to a column (in your case underlying) and then the subsequent analysis is done. However, one can simplify also the first part of your analysis. You want your analysis done for every week, so first calculate the week (using the code from here) and then also group by week. With summarise, you only get the result of your calculations. Try to get a better grip of what dplyr can do.

library(dplyr)
library(lubridate)

data_raw <- "Underlying       Date ClosingPrice      Size NumberOfBuyers NumberOfSellers
DE0005089031 04.01.2016      49.5010  SmallCap              0               0                    
DE0005089031 04.01.2016      49.5010 MediumCap              0               0                    
DE0005089031 04.01.2016      49.5010  LargeCap              1               2                    
DE0005089031 05.01.2016      49.7855  SmallCap              0               0                    
DE0005089031 05.01.2016      49.7855 MediumCap              0               1                    
DE0005089031 05.01.2016      49.7855  LargeCap              0               0
DE0005089031 14.01.2016      49.5010  SmallCap              0               0                    
DE0005089031 14.01.2016      59.5010 MediumCap              0               0                    
DE0005089031 14.01.2016      69.5010  LargeCap              1               2                    
DE0005089031 15.01.2016      79.7855  SmallCap              0               0                    
DE0005089031 15.01.2016      89.7855 MediumCap              0               1                    
DE0005089031 15.01.2016      99.7855  LargeCap              0               0"

data <- read.table(text = data_raw, header = TRUE)

data %>% 
  mutate(Date = as.Date(Date, format = "%d.%m.%Y"),
         week = (year(Date) - year(min(Date)))*52 + 
           week(Date) - week(min(Date))) %>% 
  group_by(Underlying, week) %>% 
  summarise(calculation = log(ClosingPrice[1] / ClosingPrice[n()]))
# A tibble: 3 x 3
# Groups:   Underlying [1]
  Underlying    week calculation
  <fct>        <dbl>       <dbl>
1 DE0005089031     0    -0.00573
2 DE0005089031     1    -0.339  
3 DE0005089031     2    -0.224  
starja
  • 9,887
  • 1
  • 13
  • 28
  • `# A tibble: 96 x 3 # Groups: Underlying [96] Underlying week calculation 1 DE0005089031 NA 0.461 2 DE0005140008 NA 1.16 3 DE0005158703 NA -0.380 4 DE0005190003 NA 0.555 5 DE0005200000 NA -0.154 6 DE0005313704 NA -1.18 7 DE0005408116 NA 0.653 8 DE0005419105 NA -0.0399 9 DE0005437305 NA -0.728 10 DE0005439004 NA 1.03 # ... with 86 more rows` I get this result @starja. Unfortunately I get NAs and it changes I think directly the share. – Qncy Jun 01 '20 at 18:43
  • The next question would be, if it works with groupby, can I calculate my correlations, betas and risk-free interest rates for each individual underlying in the same scheme? I use index data via yahoo finance in an extra dataframe. – Qncy Jun 01 '20 at 19:00
  • It seems that the calculation of the week goes wrong. Have a look at this column and try to find out for which entries and why it goes wrong (entry is `NA`). In general, yes it is possible to calculate other things with this scheme, see e.g. [here](https://stackoverflow.com/questions/50925734/dplyr-use-a-custom-function-in-summarize-after-group-by). Also, if you use `dplyr` in this way it's easier to have all data for the calculation in one data.frame – starja Jun 01 '20 at 19:57