I'd like to use a lookup table to add a column with additional data using tidyr/dplyr functions. I've found some basic examples that perform this operation but can't get it to work with my data and I'm not really understanding the functions as the responses don't explain what is happening.
I want to match the id column in this data frame:
>df
id sample_name fpkm conf_hi conf_lo quant_status
1 XLOC_000118 T1 33.857900 62.323300 5.3925000 OK
2 XLOC_000118 T2 169.793000 395.783000 0.0000000 OK
3 XLOC_000118 T3 41.869200 69.395700 14.3427000 OK
4 XLOC_009095 T1 1.472500 3.076350 0.0000000 OK
5 XLOC_009095 T2 3.828400 8.171850 0.0000000 OK
6 XLOC_009095 T3 1.806010 4.055220 0.0000000 OK
...to the same values in this lookup table and add the name
value to df
in a new column where lookupTable$name
matches df$id
:
>lookupTable
id name
1 XLOC_000118 Xy13
2 XLOC_009104 Xy3
3 XLOC_009105 Zy3
4 XLOC_009095 Xy6
5 XLOC_018501 Xy9
6 XLOC_020049 Xy35
I attempted to adapt the code from this question but receive errors:
df %>%
gather(key = "col") %>%
left_join(ObLookup, by = "id") %>%
spread(key = id, value = name)
Error: `by` can't contain join column `id` which is missing from LHS
In addition: Warning message:
attributes are not identical across measure variables;
they will be dropped
I figured the following solution out on my own which produces the result that I want, but I would like to know if there is a solution using tidyr or dplyr:
> df$names <- lookupTable$name[match(df$id, lookupTable$id)]
> df
id sample_name fpkm conf_hi conf_lo quant_status names
1 XLOC_000118 T1 33.857900 62.323300 5.3925000 OK Obp13
2 XLOC_000118 T2 169.793000 395.783000 0.0000000 OK Obp13
3 XLOC_000118 T3 41.869200 69.395700 14.3427000 OK Obp13
4 XLOC_009095 T1 1.472500 3.076350 0.0000000 OK Obp6
5 XLOC_009095 T2 3.828400 8.171850 0.0000000 OK Obp6
6 XLOC_009095 T3 1.806010 4.055220 0.0000000 OK Obp6