1

I am trying to create a new column in an existing dataframe (df1) by comparing the values of column x to the values of column y in a different dataframe (df2).

The result should look like df_end. Where there is a match, the value of column x should be returned. Where there is no match, NA should be returned.

df1 <- data.frame(x = c("blue2", "blue6", "green9", "green7"))
df2 <- data.frame(y = c("blue2", "green9"))

df_end <- data.frame(x = c("blue2", "blue6", "green9", "green7"),
                     match = c("blue2", NA, "green9", NA))

I have experimented with merge, match and if,else statements, but I can't figure it out. Someone has a piece of advice for me?

#Attempt 1: Merge
df1$match <- merge(df1, df2, by.x = x, all = TRUE)

This does not work, because df1 and df2 are of different length.

Bugs
  • 4,491
  • 9
  • 32
  • 41
SHW
  • 461
  • 7
  • 26

1 Answers1

1

I did the following:

df1 <- data.frame(x = c("blue2", "blue6", "green9", "green7"))
df2 <- data.frame(y = c("blue2", "green9"))

end <- sapply(df1$x, function(x) { # for each value in df1$x
  j <- which(df2$y == x) # check if df2$y has a match
  ifelse(length(j) > 0, j, NA) # if there is, give the location in the vector
}) # if not give NA

cbind(df1,df2, match = df2$y[end]) # subset the df2 with the location to get the characters

#       x      y  match
#1  blue2  blue2  blue2
#2  blue6 green9   <NA>
#3 green9  blue2 green9
#4 green7 green9   <NA>

EDIT: see sotos' comment for the best answer: df2$y[match(df1$x, df2$y)]

Evan Friedland
  • 3,062
  • 1
  • 11
  • 25