These are the data transformation steps, to out your data set into a state
where you can calculate correlations easily:
- Extract week and year from the Date column, to create a week identifier unique
accross years (
week()
and year()
are lubridate
functions).
- Drop Date column.
- Make the dataset wide, so that the value for each stock is in a separate
column (
pivot_wider()
is a tidyr
function).
Code:
library(lubridate)
library(dplyr)
library(tidyr)
week_stocks <-
stocks %>%
mutate(Week = paste(year(Date), week(Date), sep = "_")) %>%
select(StockID, Value, Week) %>%
pivot_wider(names_from = StockID, values_from = Value)
After the transformation you use cor()
to get correlations of all stocks.
Since you are only interested in the correlations with one specific stock,
you can use select()
to drop all other stocks and their correlations.
cor(week_stocks[-1]) %>%
as_tibble(rownames = "stockIDs") %>%
select(stockIDs, `210449`)
Some general remarks:
- In the code in your question you use the
attach()
command, it is generally
not recommended to use that for data frames, as it can lead to confusion and
errors; see this blogpost.
- If you are looking into improving your R Skills check out the
tidyverse and it’s packages. It is a great set of packages
which share a concept for data science operations, that is very powerful and allows you
to solve most data science problems with a small set of concise commands.
- When asking a question on StackOverflow it is usually good practice to
include your data or at least parts of it.
dput()
provides the code needed
for that. By pasting the output of dput()
into your question other users
can recreate your data in their environment and develop their answer while testing
their code against your data. For example this is what the dput()
output of
my dummy dataset looks like, I used to create the code in my answer.
Code:
dput(stocks)
Output:
structure(list(StockID = c(16139, 210449, 210449, 210449, 210449,
210449), Date = c("2015-09-11", "2015-09-11", "2015-09-18", "2015-09-25",
"2015-10-02", "2015-10-09"), Value = c(0.055063, 0.01851903,
0.01338099, 0.03982749, 0.04798457, 0.02433628)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))