0

I have the following dataframes: df1:

Animal Farm Year    y  
1      A      1    23
2      A      2    24    
3      B      1    13   
4      B      3    28 
5      C      1    25

df2

Farm Year Month  C   Mm  Pp
 A      1    1    23  15  4
 A      1    2    25  18  5
 A      1    3    13  35  7
 A      1    4    18  38  4

df3

Farm Year Month  C   Mm  Pp
 B      1    1    12  13  3
 B      1    2    22  11  2
 B      1    3    14  17  5
 B      1    4    18  38  6

df4

 Farm Year Month  C   Mm  Pp
     C      3    1    12  13  3
     C      3    2    22  11  2
     C      3    3    14  17  5
     C      3    4    18  38  6

I want the following for all animals and farms present in df1: ex:

df5

 Animal Farm  Year  month  C Mm  Pp y
    1   A      1    1    23  15  4 23
    1   A      1    2    25  18  5 23
    1   A      1    3    13  35  7 23
    1   A      1    4    18  38  4 23
    3   B      1    1    12  13  3 13
    3   B      1    2    22  11  2 13
    3   B      1    3    14  17  5 13
    3   B      1    4    18  38  6 13

I tried

df1 %>% full_join(df2, by= c("farm", "year")) %>%
full_join(df3, by = c("farm, "year", "month")) %>%
full_join(df4, by = c("farm, "year", "month")) --> df5

but it did't work correctly. So, I also tried:

library(gtools)

df5 <- smartbind(df1,df2, df3, df4)
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Johanna Ramirez
  • 161
  • 1
  • 9
  • 2
    In your last `full_join`, you're missing the `"` after `farm`. You also have the issue that your columns are capitalized `Farm` and `Year`, but your code uses lower case `farm` and `year`. – Gregor Thomas Dec 29 '21 at 21:33
  • 3
    However, since there isn't really a "join" between `df2`, `df3`, and `df4` - they just need to be stacked, I'd suggest `bind_rows(df2, df3, df4) %>% full_join(df1)`. Since the column names to join on are the same in `df1` and the others, the default to join on all matching column names should work fine. – Gregor Thomas Dec 29 '21 at 21:35
  • 1
    I think it would be useful to read about the types of joins and merges, see https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/q/5706437/3358272. Once you better understand the "calculus" of merges, it will become more apparent what steps are needed. In the case of this question, it is just one merge/join after combining three frames, exactly as GregorThomas demonstrated in their code. – r2evans Dec 29 '21 at 21:41

2 Answers2

1

Here a way to go with data.table:

library(data.table)

df5 = merge(df1, rbindlist(list(df2,df3,df4)), by = c("Farm", "Year"))

Output:

> df5
   Farm Year Animal  y Month  C Mm Pp
1:    A    1      1 23     1 23 15  4
2:    A    1      1 23     2 25 18  5
3:    A    1      1 23     3 13 35  7
4:    A    1      1 23     4 18 38  4
5:    B    1      3 13     1 12 13  3
6:    B    1      3 13     2 22 11  2
7:    B    1      3 13     3 14 17  5
8:    B    1      3 13     4 18 38  6

Update, if you want to see all rows, whether it's matched or not:

> merge(df1, rbindlist(list(df2,df3,df4)), by = c("Farm", "Year"), all=TRUE)
    Farm Year Animal  y Month  C Mm Pp
 1:        NA     NA NA    NA NA NA NA
 2:    A    1      1 23     1 23 15  4
 3:    A    1      1 23     2 25 18  5
 4:    A    1      1 23     3 13 35  7
 5:    A    1      1 23     4 18 38  4
 6:    A    2      2 24    NA NA NA NA
 7:    B    1      3 13     1 12 13  3
 8:    B    1      3 13     2 22 11  2
 9:    B    1      3 13     3 14 17  5
10:    B    1      3 13     4 18 38  6
11:    B    3      4 28    NA NA NA NA
12:    C    1      5 25    NA NA NA NA
13:    C    3     NA NA     1 12 13  3
14:    C    3     NA NA     2 22 11  2
15:    C    3     NA NA     3 14 17  5
16:    C    3     NA NA     4 18 38  6
Marco_CH
  • 3,243
  • 8
  • 25
  • 2
    Your code does not parse; in your shift from `rbind(rbind(...),...)`, you left an extra `df4` and also an extra paren. Also, I don't understand why `!is.na(.)` (etc) is needed here. – r2evans Dec 29 '21 at 21:47
  • 1
    Thank you r2evans! Just fixed it. Was maybe too late for me... – Marco_CH Dec 29 '21 at 22:18
  • @Marco_CH, thanks for your help, but the code only works with the first two files.. the information of df3 and df4 are not inserted, any recommendation? – Johanna Ramirez Dec 30 '21 at 15:48
  • df3 is inserted if I test it. df4 not, because there is no link between df1 and df4 (in df1 is no Farm C and Month 3). If you want to see everything you have to add `all=TRUE` (see update). – Marco_CH Dec 30 '21 at 15:53
  • Just tested it again and converted the df1 to data.frame (instead of data.table) and I still get exact the result you've expected in your initial post. If you're having still issues, please copy your code. – Marco_CH Dec 30 '21 at 16:03
1

If you can share your dataframes so they can be reproduced, it would be helpful.

Here is an example:

mtcars[1:3,1:3] %>% dput

structure(list(mpg = c(21, 21, 22.8), cyl = c(6, 6, 4), disp = c(160, 
160, 108)), row.names = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710"
), class = "data.frame")
Antex
  • 1,364
  • 4
  • 18
  • 35