1

For a project in university, i'm working with large stock price dataframe's.

I have two dataframes.

Dataframe df1 includes the daily close prices over a certain time. The header includes the stock's shortcut.

Dataframe df2 includes the stock's shortcut in the first column and in the second column, there is the industry name of the stock's firm. IMPORTANT to know is that in df2 there are more values than in df1 (but every value in df1 should be in df2)

Is there any possibility to integrate the second column of df2 into the first row of df1 if they match (=> value from df1 header = df2 first column)

# Example Code


df1=as.data.frame(matrix(runif(20,min=0,max=1), nrow = 4))
df1

df2 <- as.data.frame(c("V1","V829","V2","V3","V493","V4","V5","V6","V992","V7"))
df2$insert <- c("test1","test2","test3","test4","test5","test6","test7","test8","test9","test10")
names(df2) <- c("Column2","test")

df1
df2

# Now insert/combine df2$test in (or over) df1[1,] as a row, if names(df1) and df2$Column2 matches


enter image description here (DataFrame df1)

enter image description here (DataFrame df2)

Thank you for your answers guys!

Nino

nino123
  • 11
  • 2

1 Answers1

0

I would recommend you reshape your df1 into long format (see Reshaping data.frame from wide to long format).

library(tidyr)
df1_long <- df1 %>% gather(Instrument, value, -X)

I would organize the file this way because that makes it easier to use left__join() to match the data frames (see a description of mutating joins on the data wrangling cheat sheet).

df <- left_join(df1_long, df2, by = "Instrument")

If you want you can then make your dataframe wide again using the spread() function, which is the reverse of gather().

For the future I recommend you generate a reproducible example, rather than linking image files of your dataframes, as the links might expire, and it makes it generally less likely to get an answer on Stack Overflow.

Megazord
  • 51
  • 5
  • Thanks for your answer, it works well, until the part to make the dataframe wide again. Shouldn't it be: df_new <- df %>% spread(Instrument, value, df1_Column). Because my RStudio hangs itself up all the time – nino123 Oct 20 '22 at 13:59
  • Theoretically, ```df_new <- df %>% spread(Instrument, value)``` would suffice. Perhaps, however, you could consider keeping the data in long format? The format is ideal for downstream use with all the tidyverse tools. – Megazord Oct 20 '22 at 14:57