Trying to create a column in dataframe df1 based on match in another dataframe df2, where df1 is much bigger than df2:
df1$val2 <- df2$val2[match(df1$id, df2$IDs)]
This doesn't quite work because df2$IDs column is a list:
> df2
IDs val2
1 0 1
2 1, 2 2
3 3, 4 3
4 5, 6 4
5 7, 8 5
6 9, 10 6
7 11, 12, 13, 14 7
It only works for the part where the list has 1 element (row 1: ..$ : int 0 above). For all other rows the 'match(df1$id, df2$IDs)' returns NA.
Test of matching some individual numbers works just fine with double brackets:
2 %in% df2[[2,'IDs']]
So, I either need to modify the column df2$IDs or need to perform match operation differently. The df1 has many other columns, so does the df2, but df2 is much shorter in rows.
The case can be reproduced with the following:
IDs <- c("[0]", "[1, 2]", "[3, 4]", "[5, 6]", "[7, 8]", "[9, 10]", "[11, 12, 13, 14]")
val2 <- c(1,2,3,4,5,6,7)
df2 <- data.frame(IDs, val2)
df2$IDs <- lapply(strsplit(as.character(df2$IDs), ','), function (x) as.integer(gsub("\\s|\\[|\\]", "", x)))
id <- floor(runif(100, min=0, max=15))
df1 <- data.frame(id)
str(df1)
str(df2)
df1$val2 <- df2$val2[match(df1$id, df2$IDs)]