0

I have two data.frame(panel data) down below

A <- data.table("ID"=rep(c("OO1", "002", "003", "004", "005"), each=3), 
                "DATE"=rep(c(200802, 200803, 200804), time=5), 
                "Weight"=rep(c(60, 55, 80, 73, 62), each=3)); A 

     ID   DATE Weight
 1: OO1 200802     60
 2: OO1 200803     60
 3: OO1 200804     60
 4: 002 200802     55
 5: 002 200803     55
 6: 002 200804     55
 7: 003 200802     80
 8: 003 200803     80
 9: 003 200804     80
10: 004 200802     73
11: 004 200803     73
12: 004 200804     73
13: 005 200802     62
14: 005 200803     62
15: 005 200804     62

and

B <- data.table("ID"=rep(c("002", "003", "004", "007", "008"), each=3), 
                "DATE"=rep(c(200806, 200807, 200808), time=5), 
                "EXERCISE"=rep(c("sleep", "studying", 
                                 "movie", "jogging", "basketball"), each=3)); B
     ID   DATE   EXERCISE
 1: 002 200806      sleep
 2: 002 200807      sleep
 3: 002 200808      sleep
 4: 003 200806   studying
 5: 003 200807   studying
 6: 003 200808   studying
 7: 004 200806      movie
 8: 004 200807      movie
 9: 004 200808      movie
10: 007 200806    jogging
11: 007 200807    jogging
12: 007 200808    jogging
13: 008 200806 basketball
14: 008 200807 basketball
15: 008 200808 basketball

I try to merge these two data.frame, and I only want to reserve the information of data.frame A,that is ID=001, 002, 003, 004, 005. So, what I expected the data is

    ID   DATE Weight   EXERCISE
1  OO1 200802     60       <NA>
2  OO1 200803     60       <NA>
3  OO1 200804     60       <NA>
4  002 200802     55       <NA>
5  002 200803     55       <NA>
6  002 200804     55       <NA>
7  002 200806     NA      sleep
8  002 200807     NA      sleep
9  002 200808     NA      sleep
10 003 200802     80       <NA>
11 003 200803     80       <NA>
12 003 200804     80       <NA>
13 003 200806     NA   studying
14 003 200807     NA   studying
15 003 200808     NA   studying
16 004 200802     73       <NA>
17 004 200803     73       <NA>
18 004 200804     73       <NA>
19 004 200806     NA      movie
20 004 200807     NA      movie
21 004 200808     NA      movie
22 005 200802     62       <NA>
23 005 200803     62       <NA>
24 005 200804     62       <NA>

but my code's result is like

C <- merge(A, B, by="ID", all.x=T); C
    ID DATE.x Weight DATE.y EXERCISE
1  002 200802     55 200806    sleep
2  002 200802     55 200807    sleep
3  002 200802     55 200808    sleep
4  002 200803     55 200806    sleep
5  002 200803     55 200807    sleep
6  002 200803     55 200808    sleep
7  002 200804     55 200806    sleep
8  002 200804     55 200807    sleep
9  002 200804     55 200808    sleep
10 003 200803     80 200806 studying
11 003 200803     80 200807 studying
12 003 200803     80 200808 studying
13 003 200802     80 200806 studying
14 003 200802     80 200807 studying
15 003 200802     80 200808 studying
16 003 200804     80 200806 studying
17 003 200804     80 200807 studying
18 003 200804     80 200808 studying
19 004 200804     73 200807    movie
20 004 200804     73 200808    movie
21 004 200804     73 200806    movie
22 004 200802     73 200807    movie
23 004 200802     73 200808    movie
24 004 200802     73 200806    movie
25 004 200803     73 200807    movie
26 004 200803     73 200808    movie
27 004 200803     73 200806    movie
28 005 200802     62     NA     <NA>
29 005 200803     62     NA     <NA>
30 005 200804     62     NA     <NA>
31 OO1 200802     60     NA     <NA>
32 OO1 200803     60     NA     <NA>
33 OO1 200804     60     NA     <NA>

and I try another way, no [by="ID"],

> C <- merge(A, B, all.x=T, all.y=T); C
    ID   DATE Weight   EXERCISE
1  002 200802     55       <NA>
2  002 200803     55       <NA>
3  002 200804     55       <NA>
4  002 200806     NA      sleep
5  002 200807     NA      sleep
6  002 200808     NA      sleep
7  003 200802     80       <NA>
8  003 200803     80       <NA>
9  003 200804     80       <NA>
10 003 200806     NA   studying
11 003 200807     NA   studying
12 003 200808     NA   studying
13 004 200802     73       <NA>
14 004 200803     73       <NA>
15 004 200804     73       <NA>
16 004 200806     NA      movie
17 004 200807     NA      movie
18 004 200808     NA      movie
19 005 200802     62       <NA>
20 005 200803     62       <NA>
21 005 200804     62       <NA>
22 007 200806     NA    jogging
23 007 200807     NA    jogging
24 007 200808     NA    jogging
25 008 200806     NA basketball
26 008 200807     NA basketball
27 008 200808     NA basketball
28 OO1 200802     60       <NA>
29 OO1 200803     60       <NA>
30 OO1 200804     60       <NA>

it's close to what I expected, but didn't reserve only A's ID, so I'm curios about what is the difference between merge and merge, by in R, and how to fix it?

CHIA
  • 59
  • 5
  • Seems like you just want `C <- merge(A, B, all.x=T)`. There options are all described in the `?merge` help page. So this existing question might help explain a bit more: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – MrFlick Feb 17 '22 at 03:23
  • The by parameter species the name or names of the matching criteria. If none are specified the default is to use all matching column names. This should be described in the help page. – IRTFM Feb 17 '22 at 03:24
  • 1
    Indeed `?merge` - "*By default the data frames are merged on the columns with names they both have...*" – thelatemail Feb 17 '22 at 03:25

0 Answers0