0

I have two data frames: a.

     Group n
         1 3
         2 3
         3 4

b.

Group  n
    1
    1 
    1
    2
    2
    2
    3
    3
    3
    3
   

I want to transfer the column n from data frame a) to data frame b), such that I end up with:

Group  n
    1  3
    1  3
    1  3
    2  3
    2  3
    2  3
    3  4
    3  4
    3  4
    3  4

How can I achieve this using R?

Thanks!

janina
  • 65
  • 1
  • 1
  • 4

1 Answers1

1

Like this?

library(tidyverse)
df1 <- tribble(~"Group", ~"n",
        1, 3,
        2, 3,
        3, 4)
df2 <- tribble(~"Group",
               1,
               1,
               1,
               2,
               2,
               2,
               3,
               3,
               3)
df3 <- dplyr::left_join(df1, df2, by = "Group")
df3
## A tibble: 9 x 2
#  Group     n
#  <dbl> <dbl>
#1     1     3
#2     1     3
#3     1     3
#4     2     3
#5     2     3
#6     2     3
#7     3     4
#8     3     4
#9     3     4

Edit

If the dataframes you want to merge are very large, the data.table package can offer a significant speedup, e.g.

#install.packages("microbenchmark")
library(microbenchmark)
#install.packages("data.table")
library(data.table)


dplyr_func <- function(){
  df1 <- tribble(~"Group", ~"n",
                 1, 3,
                 2, 3,
                 3, 4)
  df2 <- tribble(~"Group",
                 1,
                 1,
                 1,
                 2,
                 2,
                 2,
                 3,
                 3,
                 3)  
  left_join(df1, df2, by = "Group")
}

dt_func <- function(){
  df1 <- tribble(~"Group", ~"n",
                 1, 3,
                 2, 3,
                 3, 4)
  df2 <- tribble(~"Group",
                 1,
                 1,
                 1,
                 2,
                 2,
                 2,
                 3,
                 3,
                 3)
  df1 <- setDT(df1)
  df2 <- setDT(df2)
  df3 <- df1[df2, on='Group']
}

microbenchmark(dplyr_func(), dt_func())
#Unit: milliseconds
#         expr      min       lq     mean   median       uq       max neval
# dplyr_func() 2.860449 3.005662 3.550877 3.183445 3.444416 16.001694   100
#    dt_func() 1.479301 1.570013 1.808001 1.698145 1.807476  6.958431   100

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • Thanks, this gives me exactly what I was looking for! Is there a way I could do this without copying the data frames? This is already super helpful though. – janina May 03 '21 at 03:17
  • What do you mean by "without copying the dataframes"? Are the dataframes large? (i.e. millions of rows) – jared_mamrot May 03 '21 at 03:24
  • Do you mean how I created df1 / df2? You don't need to create data if you already have data - that is just to show an example of a strategy you can implement - if you already have two dataframes, and they both contain a column named e.g. "Group", then you can just use `new_dataframe <- left_join(first_dataframe, second_dataframe, by = "Group")` or the higher performance data.table version that I just added to my answer `df1 <- setDT(first_dataframe); df2 <- setDT(second_dataframe); new_dataframe <- df1[df2, on='Group']` – jared_mamrot May 03 '21 at 03:45
  • Great, thank you! That was exactly my question that I was trying to express coherently. This was really helpful! – janina May 03 '21 at 03:47