18

I want to add variables from dat2:

          concreteness familiarity typicality
amoeba            3.60        1.30       1.71
bacterium         3.82        3.48       2.13
leech             5.71        1.83       4.50

To dat1:

    ID  variable value
1    1    amoeba     0
2    2    amoeba     0
3    3    amoeba    NA
251  1 bacterium     0
252  2 bacterium     0
253  3 bacterium     0
501  1     leech     1
502  2     leech     1
503  3     leech     0

Giving the following output:

    X ID  variable value concreteness familiarity typicality
1   1  1    amoeba     0         3.60        1.30       1.71
2   2  2    amoeba     0         3.60        1.30       1.71
3   3  3    amoeba    NA         3.60        1.30       1.71
4 251  1 bacterium     0         3.82        3.48       2.13
5 252  2 bacterium     0         3.82        3.48       2.13
6 253  3 bacterium     0         3.82        3.48       2.13
7 501  1     leech     1         5.71        1.83       4.50
8 502  2     leech     1         5.71        1.83       4.50
9 503  3     leech     0         5.71        1.83       4.50

As you can see the info from dat1 has to be replicated over several rows in dat2.

This was my failed attempt:

dat3 <- merge(dat1, dat2, by=intersect(dat1$variable(dat1), dat2$row.names(dat2)))

Givng the following error:

Error in as.vector(y) : attempt to apply non-function

Please find replicate examples here:

dat1:

structure(list(ID = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), variable = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("amoeba", "bacterium", 
"leech", "centipede", "lizard", "tapeworm", "head lice", "maggot", 
"ant", "moth", "mosquito", "earthworm", "caterpillar", "scorpion", 
"snail", "spider", "grasshopper", "dust mite", "tarantula", "termite", 
"bat", "wasp", "silkworm"), class = "factor"), value = c(0L, 
0L, NA, 0L, 0L, 0L, 1L, 1L, 0L)), .Names = c("ID", "variable", 
"value"), row.names = c(1L, 2L, 3L, 251L, 252L, 253L, 501L, 502L, 
503L), class = "data.frame")

dat2:

structure(list(concreteness = c(3.6, 3.82, 5.71), familiarity = c(1.3, 
3.48, 1.83), typicality = c(1.71, 2.13, 4.5)), .Names = c("concreteness", 
"familiarity", "typicality"), row.names = c("amoeba", "bacterium", 
"leech"), class = "data.frame")
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
SarahDew
  • 426
  • 2
  • 4
  • 15

3 Answers3

16

You could add a join variable to dat2 then using merge:

dat2$variable <- rownames(dat2)
merge(dat1, dat2)
   variable ID value concreteness familiarity typicality
1    amoeba  1     0         3.60        1.30       1.71
2    amoeba  2     0         3.60        1.30       1.71
3    amoeba  3    NA         3.60        1.30       1.71
4 bacterium  1     0         3.82        3.48       2.13
5 bacterium  2     0         3.82        3.48       2.13
6 bacterium  3     0         3.82        3.48       2.13
7     leech  1     1         5.71        1.83       4.50
8     leech  2     1         5.71        1.83       4.50
9     leech  3     0         5.71        1.83       4.50
agstudy
  • 119,832
  • 17
  • 199
  • 261
12

Try this:

merge(dat1, dat2, by.x = 2, by.y = 0, all.x = TRUE)

This assumes that if there are any rows in dat1 that are unmatched then the dat2 columns in the result should be filled with NA and if there are unmatched values in dat2 then they are disregarded. For example:

dat2a <- dat2
rownames(2a)[3] <- "elephant"
# the above still works:
merge(dat1, dat2a, by.x = 2, by.y = 0, all.x = TRUE)

The above is known as a left join in SQL and can be done like this in sqldf (ignore the warning):

library(sqldf)
sqldf("select * 
         from dat1 left join dat2 
         on dat1.variable = dat2.row_names", 
       row.names = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
7

Nothing wrong with @agstudy's answer, but you can do it without actually modifying dat2 by creating an anonymous temporary. Adding X is similar:

> merge(cbind(dat1, X=rownames(dat1)), cbind(dat2, variable=rownames(dat2)))
   variable ID value   X concreteness familiarity typicality
1    amoeba  1     0   1         3.60        1.30       1.71
2    amoeba  2     0   2         3.60        1.30       1.71
3    amoeba  3    NA   3         3.60        1.30       1.71
4 bacterium  1     0 251         3.82        3.48       2.13
5 bacterium  2     0 252         3.82        3.48       2.13
6 bacterium  3     0 253         3.82        3.48       2.13
7     leech  1     1 501         5.71        1.83       4.50
8     leech  2     1 502         5.71        1.83       4.50
9     leech  3     0 503         5.71        1.83       4.50
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
  • Similar question. I have multiple data frames which all are "per week" records. I'd like to merge them all and plot them into one data frame. How would I go about merging them? – kiwicomb123 Oct 02 '19 at 06:57
  • @kiwicomb123 Here and in agstudy's answer, the key is to have a column with the same name in each data frame, which ties the results together. `variable` already exists in `dat1` so it is added to `dat2` in order to do the merge. `X` is added to `dat1` simply to match the output in the question. (Note that there are no column names other than `variable` in common in the modified data frames here, so `merge` works with the default `by` argument.) – Matthew Lundberg Oct 02 '19 at 18:11