0

I am new to R and trying to join 3 data frames together that have columns with same names. Below is the example of the data frames.

dfA =

user_id    log_id    Coding    Games    Storytelling    NA
001        1234      1          1             0         0
001        1235      0          0             1         0
002        1236      1          1             1         0
002        1237      0          0             0         1

dfB=

user_id    log_id    Coding    Media   Storytelling  NA
003        1238      0          1           1        0         
003        1239      0          0           0        1         
003        1240      1          1           1        0         
004        1241      0          1           0        0         
004        1242      1          1           1        0         

dfC=

user_id    log_id   Numbers   Search   Storytelling    NA
001        1243      1          1             0         0
001        1244      0          0             1         0
003        1245      1          1             0         0
005        1246      0          0             0         1
006        1247      0          0             1         0
006        1248      1          0             1         1
007        1249      0          0             0         1

I need to join these data frames by user_id and log_id to have the following product.

user_id  log_id  Coding  Media    Number    Search   Games     Storytelling    NA
 001      1234     1      0         0         0        1            0          0
 001      1235     0      0         0         0        0            1          0
 001      1243     0      0         1         1        0            0          0
 001      1244     0      0         0         0        0            1          0
 002      1236     1      0         0         0        1            1          0
 002      1237     0      0         0         0        0            0          1
 003      1238     0      1         0         0        0            1          0
 003      1239     0      0         0         0        0            0          1
 003      1240     1      1         0         0        0            1          0
 003      1245     0      0         1         1        0            0          0
 004      1241     0      1         0         0        0            0          0
 004      1242     1      1         0         0        0            1          0
 005      1246     0      0         0         0        0            0          1
 006      1247     0      0         0         0        0            1          0
 006      1248     0      0         1         0        0            1          1
 007      1249     0      0         0         0        0            0          1

The columns in the data frames other than user_id and log_id are categories and the total possible categories are

Total_Category = Coding, Games, Storytelling, Search, Media, Number, NA

When I join the 3 data frames by user_id and log_id, and the variables with same names gets over-written in the joins. I am not sure how to fix this issue.

datalist_final = list(dfA, dfB, dfC)
final_user_logs = Reduce(function(...)
  {
  join(...,by=c("user_id", "log_id"), type="full")
  }, datalist_final)

I am thinking to rename the variables in each data frame to something like coding_dfA, coding_dfB, coding_dfC and then do the join and then add coding_dfA, coding_dfB, and coding_dfC after the join to have "coding", but that will be a lot of hard coding, and if in the next run, any category changes in any of the data frame then I have to change every thing.

Another thing that I can think of is to use rbind. Since user_id and log_id are unique I can also put the data frames on top of each other, but in that case I have to manually create the columns for missing categories in each data frame. Is it possible to automatically check the list of categories and if there is no variable in the data frame with that name then create that variable, so I can put the data frames on top of each other instead of doing the join.

Any help will be greatly appreciated.

nasia jaffri
  • 803
  • 2
  • 12
  • 21

0 Answers0