I am new to R and trying to join 3 data frames together that have columns with same names. Below is the example of the data frames.
dfA =
user_id log_id Coding Games Storytelling NA
001 1234 1 1 0 0
001 1235 0 0 1 0
002 1236 1 1 1 0
002 1237 0 0 0 1
dfB=
user_id log_id Coding Media Storytelling NA
003 1238 0 1 1 0
003 1239 0 0 0 1
003 1240 1 1 1 0
004 1241 0 1 0 0
004 1242 1 1 1 0
dfC=
user_id log_id Numbers Search Storytelling NA
001 1243 1 1 0 0
001 1244 0 0 1 0
003 1245 1 1 0 0
005 1246 0 0 0 1
006 1247 0 0 1 0
006 1248 1 0 1 1
007 1249 0 0 0 1
I need to join these data frames by user_id and log_id to have the following product.
user_id log_id Coding Media Number Search Games Storytelling NA
001 1234 1 0 0 0 1 0 0
001 1235 0 0 0 0 0 1 0
001 1243 0 0 1 1 0 0 0
001 1244 0 0 0 0 0 1 0
002 1236 1 0 0 0 1 1 0
002 1237 0 0 0 0 0 0 1
003 1238 0 1 0 0 0 1 0
003 1239 0 0 0 0 0 0 1
003 1240 1 1 0 0 0 1 0
003 1245 0 0 1 1 0 0 0
004 1241 0 1 0 0 0 0 0
004 1242 1 1 0 0 0 1 0
005 1246 0 0 0 0 0 0 1
006 1247 0 0 0 0 0 1 0
006 1248 0 0 1 0 0 1 1
007 1249 0 0 0 0 0 0 1
The columns in the data frames other than user_id and log_id are categories and the total possible categories are
Total_Category = Coding, Games, Storytelling, Search, Media, Number, NA
When I join the 3 data frames by user_id and log_id, and the variables with same names gets over-written in the joins. I am not sure how to fix this issue.
datalist_final = list(dfA, dfB, dfC)
final_user_logs = Reduce(function(...)
{
join(...,by=c("user_id", "log_id"), type="full")
}, datalist_final)
I am thinking to rename the variables in each data frame to something like coding_dfA, coding_dfB, coding_dfC and then do the join and then add coding_dfA, coding_dfB, and coding_dfC after the join to have "coding", but that will be a lot of hard coding, and if in the next run, any category changes in any of the data frame then I have to change every thing.
Another thing that I can think of is to use rbind. Since user_id and log_id are unique I can also put the data frames on top of each other, but in that case I have to manually create the columns for missing categories in each data frame. Is it possible to automatically check the list of categories and if there is no variable in the data frame with that name then create that variable, so I can put the data frames on top of each other instead of doing the join.
Any help will be greatly appreciated.