2

My colleague and I are collecting data on separate files and we want to reconcile the data. Our data files look like so:

df1 = data.frame(Id = c(1:5), Score= c(10,NA,4,NA,3))
df2 = data.frame(UserID= c(1:5), Result= c(NA,8,NA,3,NA))

What is the simplest way to merge the two to form the following dataset to attain the following result?

df3 = data.frame(Id= c(1:5), Score= c(10,8,4,3,3))

Changing column names and using merge() don't seem to work which is what I had hoped.

Any suggestions? Would the quickest to be run a for loop across both datasets?

timnus
  • 197
  • 10
  • What about `coalesce`? Does this help: `df1$Score <- dplyr::coalesce(df1$Score, df2$Result)` – markus Nov 19 '18 at 08:58

4 Answers4

2

you are looking for the function coalesce in the library dplyr:

dplyr::coalesce(df1,df2)
  Id Score
1  1    10
2  2     8
3  3     4
4  4     3
5  5     3
Onyambu
  • 67,392
  • 3
  • 24
  • 53
1

use the dplyr library

library(dplyr)


inner_join(df1, df2, by = c("Id" = "UserID")) %>%  #specify the column names to join
  mutate(Score = ifelse (is.na(Score), Result, Score)) %>% #This will fill all the NA Scores with values of Result
  select(Id, Score) #Finally, Select the two columns you desire

Hope that helps

Wally Ali
  • 2,500
  • 1
  • 13
  • 20
0

Assuming that any NA in one df corresponds to a value in the other df, the simplest solution is by using an ifelse() statement to insert the values from df2 inlieu of the NA in df1, thus:

df1$Score <- ifelse(is.na(df1$Score), df2$Result, df1$Score)

This says: if df1$Score is NA, write the value of df2$Result, otherwise keep df1$Score

Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
0

With base R this should work as well:

df3 <- merge(df1, df2, by.x = c("Id", "Score"), by.y = c("UserID", "Result"), all = T)
df3 <- df3[!is.na(df3$Score),]
df3
Lennyy
  • 5,932
  • 2
  • 10
  • 23