Compare two columns of two different data frames with different length of rows return a third row

Question

I have two different df which have the same columns: "O" for place and "date" for time. Df 1 gives different information for a certain place (O) and time (date) in one 1 row and df 2 has many information for the same year and place in many different rows. No I want to extract one condition of the first df that applies for all the rows of the second df if values for "O" and "date" are equal.

To make it more clear: I have one line in df 1: krnqm=250 for O=1002 and date=1885. Now I want a new column "krnqm" in df 2 where df2$krnqm = 250 for all rows where df2$0=1002 and df2$date=1885.

Unfortunately I have no idea how to put that condition into a code line and would be greatful for your help.

Welcome to Stack Overflow! Help others help you: Provide a [mcve]. You can check out [How to make a great R reproducible example](https://stackoverflow.com/q/5963269/8386140) for tips. In particular, you should [edit] your question to include (at least a subset of) your data by using the output of the R command `dput()` — duckmayr, Mar 15 '20 at 12:25
did you look into the join functions? you can do `df2 %>% left_join(df1, by = c("O","date"))` If you do not want to keep the rows where krnqm is empty (that is those without a match) just add `%>% filter(!is.na(krnqm))` — Annet, Mar 15 '20 at 12:28

score 0 · Answer 1 · answered Mar 15 '20 at 13:48

You can do this quite easily in base R using the merge function. Here's an example.

Simulate some data from your description:

df1 <- expand.grid(O = letters[c(2:4,7)], date = c(1,3))
df2 <- data.frame(O = rep(letters[1:6], c(2,3,3,6,2,2)), date = rep(1:3, c(3,2,4)))
df1$krnqm <- sample(1:1000, size = nrow(df1), replace=T)

> df1
  O date krnqm
1 b    1   833
2 c    1   219
3 d    1   773
4 g    1   514
5 b    3   118
6 c    3   969
7 d    3   704
8 g    3   914
> df2
   O date
1  a    1
2  a    1
3  b    1
4  b    2
5  b    2
6  c    3
7  c    3
8  c    3
9  d    3
10 d    1
11 d    1
12 d    1
13 d    2
14 d    2
15 e    3
16 e    3
17 f    3
18 f    3

Now let's combine the two data frames in the manner you describe.

df2 <- merge(df2, df1, all.x=T)
> df2
   O date krnqm
1  a    1    NA
2  a    1    NA
3  b    1   833
4  b    2    NA
5  b    2    NA
6  c    3   969
7  c    3   969
8  c    3   969
9  d    1   773
10 d    1   773
11 d    1   773
12 d    2    NA
13 d    2    NA
14 d    3   704
15 e    3    NA
16 e    3    NA
17 f    3    NA
18 f    3    NA

So you can see, the krnqm column in the resulting data frame contains NAs for any combinations of 'O' and 'date' that were not found in the data frame where the krnqm values were extracted from. If your df1 has other columns, that you do not want to be included in the merge, just change the merge call slightly to only use those columns that you want: df2 <- merge(df2, df1[,c("O", "date", "krnqm")], all.x=T).

Good luck!

Compare two columns of two different data frames with different length of rows return a third row

1 Answers1