0

I have two data frames. df1 is like this:

ID  Score
 A     NA
 A     NA
 B     NA
 B     NA
 B     NA
 B     NA
 C     NA
 C     NA

df2 looks like this:

Name Score
   A    70
   B    54
   C    99

I want df3 to look like this:

ID  Score
 A     70
 A     70
 B     54
 B     54
 B     54
 B     54
 C     99
 C     99

I was thinking of for looping over df1 and using the name to search in df2, but a for loop would take forever, given that my actual data frame has many rows.

Note that I want df3 to maintain the structure of df1. A, B, and C should appear the same number of times.

Mikael Jagan
  • 9,012
  • 2
  • 17
  • 48
TIC-FLY
  • 125
  • 1
  • 10
  • 3
    Does this answer your question? [How to join (merge) data frames (inner, outer, left, right)](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – divibisan Mar 23 '23 at 20:46
  • 2
    In your example, you want to drop the `Score` column in df1 and then `left_join(df1, df2, on=c('ID'='Name')` – divibisan Mar 23 '23 at 20:48
  • @divibisan Thanks for your speedy response!, does this preserve the different structures of both that A only appears once in df2 but twice in df1. Sorry if this sounds silly, i'm rather new to R. – TIC-FLY Mar 23 '23 at 20:49
  • 1
    In base R you could overwrite the `NA` values with `df1$Score <- df2[match(df1$ID, df2$Name), "Score"]` – jpsmith Mar 23 '23 at 20:56
  • @divibisan. Yours seem to be throwing this error on the 2nd bit: Error in `left_join()`: ! `...` must be empty. ✖ Problematic argument: • on = c(ID = "Name") @jpsmith Thanks for your response. This doesn't seem to work. I'm not sure why but df1's scores remain as NA. – TIC-FLY Mar 23 '23 at 21:01
  • 1
    @TIC-FLY The solution by @jpsmith worked for me. As did the `merge()` and `left_join()` solutions in the answer from @Andre Wildberg below. Try running them again on these sample dfs I used to see if your problem persists `df1 <- data.frame(ID = c(rep("A",3), rep("B",3), rep("C",3)), Score = NA)` and `df2 <- data.frame(Name = c("A", "B", "C"), Score = c(70, 54, 99))`. – L Tyrone Mar 23 '23 at 21:51
  • Managed to get it working too with some extra steps. Thanks, @LeroyTyrone – TIC-FLY Mar 23 '23 at 22:17

1 Answers1

2

Try with merge and only column ID from df1

merge(df1[,"ID",drop=F], df2, by.x="ID", by.y="Name")
  ID Score
1  A    70
2  A    70
3  B    54
4  B    54
5  B    54
6  B    54
7  C    99
8  C    99

With dplyr

library(dplyr)

left_join(df1[,"ID",drop=F], df2, c("ID" = "Name"))
  ID Score
1  A    70
2  A    70
3  B    54
4  B    54
5  B    54
6  B    54
7  C    99
8  C    99

Data

df1 <- structure(list(ID = c("A", "A", "B", "B", "B", "B", "C", "C"), 
    Score = c(NA, NA, NA, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA, 
-8L))

df2 <- structure(list(Name = c("A", "B", "C"), Score = c(70L, 54L, 99L
)), class = "data.frame", row.names = c(NA, -3L))
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
  • @TIC-FLY check the output if you use `ID` instead of `id`. Maybe just a typo? It should not create a column `id.y` given the inputs. – Andre Wildberg Mar 23 '23 at 22:10
  • Sorry @Andre Wildberg. For some reason what i'm getting is the score column basically becoming an ID column so it being the score values being 1 for A, 2 for B, 3 for C rather than 70 for A, 54 for B and 99 for C. EDIT: It seems like in my version, I had an extra ID column with was messing up the function. It does when I got rid of it. – TIC-FLY Mar 23 '23 at 22:12