Merge dataframes, different lengths

Question

I want to add variables from dat2:

          concreteness familiarity typicality
amoeba            3.60        1.30       1.71
bacterium         3.82        3.48       2.13
leech             5.71        1.83       4.50

To dat1:

    ID  variable value
1    1    amoeba     0
2    2    amoeba     0
3    3    amoeba    NA
251  1 bacterium     0
252  2 bacterium     0
253  3 bacterium     0
501  1     leech     1
502  2     leech     1
503  3     leech     0

Giving the following output:

    X ID  variable value concreteness familiarity typicality
1   1  1    amoeba     0         3.60        1.30       1.71
2   2  2    amoeba     0         3.60        1.30       1.71
3   3  3    amoeba    NA         3.60        1.30       1.71
4 251  1 bacterium     0         3.82        3.48       2.13
5 252  2 bacterium     0         3.82        3.48       2.13
6 253  3 bacterium     0         3.82        3.48       2.13
7 501  1     leech     1         5.71        1.83       4.50
8 502  2     leech     1         5.71        1.83       4.50
9 503  3     leech     0         5.71        1.83       4.50

As you can see the info from dat1 has to be replicated over several rows in dat2.

This was my failed attempt:

dat3 <- merge(dat1, dat2, by=intersect(dat1$variable(dat1), dat2$row.names(dat2)))

Givng the following error:

Error in as.vector(y) : attempt to apply non-function

Please find replicate examples here:

dat1:

structure(list(ID = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), variable = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), .Label = c("amoeba", "bacterium", 
"leech", "centipede", "lizard", "tapeworm", "head lice", "maggot", 
"ant", "moth", "mosquito", "earthworm", "caterpillar", "scorpion", 
"snail", "spider", "grasshopper", "dust mite", "tarantula", "termite", 
"bat", "wasp", "silkworm"), class = "factor"), value = c(0L, 
0L, NA, 0L, 0L, 0L, 1L, 1L, 0L)), .Names = c("ID", "variable", 
"value"), row.names = c(1L, 2L, 3L, 251L, 252L, 253L, 501L, 502L, 
503L), class = "data.frame")

dat2:

structure(list(concreteness = c(3.6, 3.82, 5.71), familiarity = c(1.3, 
3.48, 1.83), typicality = c(1.71, 2.13, 4.5)), .Names = c("concreteness", 
"familiarity", "typicality"), row.names = c("amoeba", "bacterium", 
"leech"), class = "data.frame")

agstudy · Accepted Answer · 2015-10-06T12:39:57.937

You could add a join variable to dat2 then using merge:

dat2$variable <- rownames(dat2)
merge(dat1, dat2)
   variable ID value concreteness familiarity typicality
1    amoeba  1     0         3.60        1.30       1.71
2    amoeba  2     0         3.60        1.30       1.71
3    amoeba  3    NA         3.60        1.30       1.71
4 bacterium  1     0         3.82        3.48       2.13
5 bacterium  2     0         3.82        3.48       2.13
6 bacterium  3     0         3.82        3.48       2.13
7     leech  1     1         5.71        1.83       4.50
8     leech  2     1         5.71        1.83       4.50
9     leech  3     0         5.71        1.83       4.50

This answer works with the sample data shown but would drop all unmatched rows in `dat1` if there were any. — G. Grothendieck, Dec 31 '12 at 15:48

score 12 · Answer 2 · answered Dec 31 '12 at 15:33

Try this:

merge(dat1, dat2, by.x = 2, by.y = 0, all.x = TRUE)

This assumes that if there are any rows in dat1 that are unmatched then the dat2 columns in the result should be filled with NA and if there are unmatched values in dat2 then they are disregarded. For example:

dat2a <- dat2
rownames(2a)[3] <- "elephant"
# the above still works:
merge(dat1, dat2a, by.x = 2, by.y = 0, all.x = TRUE)

The above is known as a left join in SQL and can be done like this in sqldf (ignore the warning):

library(sqldf)
sqldf("select * 
         from dat1 left join dat2 
         on dat1.variable = dat2.row_names", 
       row.names = TRUE)

score 7 · Answer 3 · answered Dec 31 '12 at 14:26

7

Nothing wrong with @agstudy's answer, but you can do it without actually modifying dat2 by creating an anonymous temporary. Adding X is similar:

> merge(cbind(dat1, X=rownames(dat1)), cbind(dat2, variable=rownames(dat2)))
   variable ID value   X concreteness familiarity typicality
1    amoeba  1     0   1         3.60        1.30       1.71
2    amoeba  2     0   2         3.60        1.30       1.71
3    amoeba  3    NA   3         3.60        1.30       1.71
4 bacterium  1     0 251         3.82        3.48       2.13
5 bacterium  2     0 252         3.82        3.48       2.13
6 bacterium  3     0 253         3.82        3.48       2.13
7     leech  1     1 501         5.71        1.83       4.50
8     leech  2     1 502         5.71        1.83       4.50
9     leech  3     0 503         5.71        1.83       4.50

answered Dec 31 '12 at 14:26

Matthew Lundberg

42,009
6
90
112

Similar question. I have multiple data frames which all are "per week" records. I'd like to merge them all and plot them into one data frame. How would I go about merging them? – kiwicomb123 Oct 02 '19 at 06:57
@kiwicomb123 Here and in agstudy's answer, the key is to have a column with the same name in each data frame, which ties the results together. `variable` already exists in `dat1` so it is added to `dat2` in order to do the merge. `X` is added to `dat1` simply to match the output in the question. (Note that there are no column names other than `variable` in common in the modified data frames here, so `merge` works with the default `by` argument.) – Matthew Lundberg Oct 02 '19 at 18:11

Merge dataframes, different lengths

3 Answers3

Linked

Related