6

I have the following two data frames:

df1

id   V1 V2 V3
210  4  NA 7
220  NA NA NA
230  2  0  1
240  4  NA NA
250  1  9  2
260  6  5  NA
270  0  NA 3

df2

id   V1 V2 V3
210  4  3  7
240  4  3  NA
270  0  3 3

df2 is all the instances where df1 has NA in V2 and at least one numeric value in V1 or V3. Where this condition holds, I have changed the NAs in V2 to '3'.

I would now like to put these dfs back together. Specifically, I would like to replace all the rows in df1 that appear in df2. My expected output is this:

id   V1 V2 V3
210  4  3 7
220  NA NA NA
230  2  0  1
240  4  3 NA
250  1  9  2
260  6  5  NA
270  0  3 3

I have looked at this question, but it does this based on specific values in the df. And this question is similarly answered by specifying the actual values to replace. My real df is huge and all I want to do is put the two dfs together, replacing the rows that appear in both with df2.

Community
  • 1
  • 1
szi
  • 63
  • 1
  • 1
  • 4

3 Answers3

12

A simple match call that will identify the instances that match df2$id within df1$id (in the correct appearance order) will solve this problem

df1[match(df2$id, df1$id), ] <- df2
df1
#    id V1 V2 V3
# 1 210  4  3  7
# 2 220 NA NA NA
# 3 230  2  0  1
# 4 240  4  3 NA
# 5 250  1  9  2
# 6 260  6  5 NA
# 7 270  0  3  3

Edit: As @plafort points out, you could avoid creating df2 in the first place, but I would go with vectorized approach instead of using apply. For example

indx <- rowSums(is.na(df1)) != (ncol(df1) - 1) & is.na(df1$V2)
df1[indx, "V2"] <- 3
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • is there any way to do this with dplyr or equiv? – Stephanie Owen Jul 16 '18 at 20:05
  • 1
    @StephanieOwen What is wrong with this solution? Is it slow? Is it hard to read? In other words why is the tool more important than the cause? – David Arenburg Jul 18 '18 at 21:10
  • 1
    Because my supervisor is really irritating and for some reason is obsessed with our lab group using the dplyr library and functions for all coding regardless of whether it works or not. nothing wrong with the solution itself dplyr functions are generally a little more readable though – Stephanie Owen Jul 23 '18 at 12:33
  • This is the most elegant solution I've ever seen because it's simple. One can always create a dummy variable and delete it when the task is done. – hnguyen Apr 20 '21 at 15:35
5

I had a matching syntax similar to @DavidArenburg with

df1[df1$id %in% df2$id, ] <- df2

Could it help to not have to create the extra data frame? If df2 is being created to fill in df1, you can quickly assign 3 to the appropriate regions without creating a new dataframe.

indx <- apply(df1, 1, function(x) is.na(x[3]) && any(!is.na(x[2:4])))
df1$V2[indx] <- 3
df1
   id V1 V2 V3
1 210  4  3  7
2 220 NA NA NA
3 230  2  0  1
4 240  4  3 NA
5 250  1  9  2
6 260  6  5 NA
7 270  0  3  3
Pierre L
  • 28,203
  • 6
  • 47
  • 69
0

In case someone needs dataframes to match in several columns using the using df1[match(dfid, dfid), ] <- df2 solution, as was my case, I used this trick

  df1$aux1<-paste(df1$id1,df1$id2) 
  df2$aux1<-paste(df2$id1,df2$id2) 
  df1[match(df2$aux1, df1$aux1), ] <- df2
  df1 <- df1[ , names(df1)!="aux1"]

This is probably very obvious, and not elegant, but it works