I have a very large dataset with a few thousand missing values, which looks something like the below:
df1:
doi journal year
1 10.1037/0002-9432.76.1.13 <NA> NA
2 10.1037/0002-9432.76.1.13 <NA> NA
3 10.1037/0002-9432.76.1.13 <NA> NA
4 10.1037/0003-066X.60.8.750 <NA> NA
5 10.1037/0003-066X.60.8.750 <NA> NA
6 10.1037/0003-066X.60.8.750 <NA> NA
I have another dataframe which contains all missing journal names and years:
df2:
doi year journal
17 10.1037/0002-9432.76.1.13 2006 American Journal of Orthopsychiatry
18 10.1037/0002-9432.76.1.13 2006 American Journal of Orthopsychiatry
19 10.1037/0002-9432.76.1.13 2006 American Journal of Orthopsychiatry
31 10.1037/0003-066x.60.8.750 2005 American Psychologist
32 10.1037/0003-066x.60.8.750 2005 American Psychologist
33 10.1037/0003-066x.60.8.750 2005 American Psychologist
Yet when I try to match the two by their doi value
df1$year[is.na(df1$year)] <- df2$year[match(df1$doi[is.na(df1$year)], df2$doi)]
df1$journal[is.na(df1$journal)] <- df2$journal[match(df1$doi[is.na(df1$journal)], df2$doi)]
this only works for some:
Result:
doi journal year
1 10.1037/0002-9432.76.1.13 American Journal of Orthopsychiatry 2006
2 10.1037/0002-9432.76.1.13 American Journal of Orthopsychiatry 2006
3 10.1037/0002-9432.76.1.13 American Journal of Orthopsychiatry 2006
4 10.1037/0003-066X.60.8.750 <NA> NA
5 10.1037/0003-066X.60.8.750 <NA> NA
6 10.1037/0003-066X.60.8.750 <NA> NA
I have tried different ways of matching the dataframes (like this or this), as well as trimming the white-space when loading the dataframes, but no success. "doi" and "journal" are character vectors, and "year" is an integer. Very thankful if anyone has some insight.