Joining dataframe with dataframes (as lists?) or creating new variabels

Question

My code looks like this:

# Dataset 1 

df1 <- data.frame(date = c("2022-12-31", "2022-09-30","2022-06-30", "2022-03-31"))


### Dataset 2 

df2 <- data.frame(date = c("2022-12-31", "2022-09-30","2022-06-30", "2022-03-31"), pe_ratio = c(22.0,22.6,22.5,28.2))

### What I want 


df3 <- data.frame(date = c("2022-12-31", "2022-09-30","2022-06-30", "2022-03-31"))
df3$pe_ratio_minus_1 <- NA
df3$pe_ratio_minus_2 <- NA
df3$pe_ratio_minus_3 <- NA
df3$pe_ratio_minus_1[1] <- df2$pe_ratio[2] 
df3$pe_ratio_minus_2[1] <- df2$pe_ratio[3]
df3$pe_ratio_minus_3[1] <- df2$pe_ratio[4]


> df3
        date pe_ratio_minus_1 pe_ratio_minus_2 pe_ratio_minus_3
1 2022-12-31             22.6             22.5             28.2
2 2022-09-30               NA               NA               NA
3 2022-06-30               NA               NA               NA
4 2022-03-31               NA               NA               NA

So what I want to do: For the date variable in df 1 I wanna create for each value in the df2 of pe_ratio a single variable for the previously value in df1. But the value of the actual date in df1 should not be contained.

Welcome to SO! This is a lot of code, most of which is irrelevant. Could you edit your question to make it a **minimal** reproducible example? You may find this link useful: [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) (specific for R) and [How to create a Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) — jpsmith, Apr 24 '23 at 13:05
There is a reproducible example. Thats because there is so much code. I structured it. I hope this helps. — user149240, Apr 24 '23 at 13:11
We really don't need the journey, and the list of required packages is not small (`tidyverse` is a large set), not to mention I cannot find `row_to_names` within your packages, is that from `janitor`? Your question can likely be reduced to two sample datasets: `pe_ratio` and `df`. Even keeping your full code if you want, can you add the output from each of `dput(head(df))` and `dput(head(pe_ratio))`? — r2evans, Apr 24 '23 at 13:20
I appreciate that you provided code to reproduce the data, but again, emphasis is on the **minimal** in MRE. You will get much better, faster help if you remove the code and libraries that are irrelevant to answer the question and ask your specific coding question clearly. Just trying to help - good luck! — jpsmith, Apr 24 '23 at 13:21
Thanks. I see it is to complex for a simple question. I rewrite my example at the moment. Thanks for the tips. — user149240, Apr 24 '23 at 13:23
This is much simpler and clearer! Thanks! But now I have questions because your description doesn't seem to match your `df3` result. The first row makes sense, but why is the 2nd row all `NA` values, shouldn't the `2022-09-30` row have a `pe_ratio_minus_1` value of 22.5, etc.? — Gregor Thomas, Apr 24 '23 at 13:48

score 3 · Accepted Answer · answered Apr 24 '23 at 13:49

3

I think this is what you want, but it has fewer missing values than your result...

library(dplyr)
df1 %>% 
  left_join(df2, by = "date") %>%
  arrange(desc(date)) %>% # make sure things are in order
  mutate(
    pe_ratio_minus_1 = lead(pe_ratio, 1),
    pe_ratio_minus_2 = lead(pe_ratio, 2),
    pe_ratio_minus_3 = lead(pe_ratio, 3)
  ) %>%
  select(-pe_ratio)
#         date pe_ratio_minus_1 pe_ratio_minus_2 pe_ratio_minus_3
# 1 2022-12-31             22.6             22.5             28.2
# 2 2022-09-30             22.5             28.2               NA
# 3 2022-06-30             28.2               NA               NA
# 4 2022-03-31               NA               NA               NA

answered Apr 24 '23 at 13:49

Gregor Thomas

136,190
20
167
294

Hello. Thanks. This works. I noticed that I have data where the date are not all equal. Can you tell me what I have do to in this case? For example: Date in df1 is "2022-09-30" but in the df2 it is "2022-08-30". So in df1 it is ahead butt both regard to the same period. – user149240 Apr 24 '23 at 15:05
1

I'd suggest asking a new question illustrating that more complex problem. – Gregor Thomas Apr 24 '23 at 15:09

score 0 · Answer 2 · answered Apr 24 '23 at 14:11

Using data.table

library(data.table)
 setDT(df1)[df2, on = .(date)][order(-date),
  paste0("pe_ratio_minus_", 1:3) := shift(pe_ratio, n = 1:3, type = "lead")][]

-output

        date pe_ratio pe_ratio_minus_1 pe_ratio_minus_2 pe_ratio_minus_3
1: 2022-12-31     22.0             22.6             22.5             28.2
2: 2022-09-30     22.6             22.5             28.2               NA
3: 2022-06-30     22.5             28.2               NA               NA
4: 2022-03-31     28.2               NA               NA               NA

score 0 · Answer 3 · answered Apr 24 '23 at 14:30

Another variant in base using outer to place the match'ed dates.

df1[paste0("pe_ratio_minus_", 1:3)] <- 
  df2$pe_ratio[match(df1$date, df2$date)[outer(seq_len(nrow(df1)), 1:3, `+`)]]
df1
#        date pe_ratio_minus_1 pe_ratio_minus_2 pe_ratio_minus_3
#1 2022-12-31             22.6             22.5             28.2
#2 2022-09-30             22.5             28.2               NA
#3 2022-06-30             28.2               NA               NA
#4 2022-03-31               NA               NA               NA

Joining dataframe with dataframes (as lists?) or creating new variabels

3 Answers3