I have multiple (more than 2) datasets that I want to link together to create "all data". I want to match the participant ID column (that is in every single dataset) so that there is one row per participant. Previously, I have used merge to link two sets, then continued this, but I have 20+ sets of data I need to merge and this will be too complicated.
I keep getting debugging stuff pop up, or errors related to the fact that there are differing numbers of rows. I want to remove any rows that does not match.
I have tried to create a function like this, based on code from here and other pieces of functions I can find: Merging more than 2 dataframes in R by rownames
MyMerge <- function(x, y){ df <- merge(x, y, by = "P_ID", all.x = F, all.y = F, sort = T, incomparables = NULL) intersect(names(x), names(y)) return(df) }
I have also tried it with by = "col.names"
and
filter(!x$P_ID %in% !y$P_ID)
and changing true and false
Then if I run the code as
All.data <- Reduce(MyMerge, list(df1, df2, df3, df4, df5, df6, by = "P_ID"))
or
All.data <- MyMerge(c(df1, df2, df3, df4, df5, df6, by = "P_ID"))
I get
error during wrapup:'by' must specify a uniquely valid column
but every df has "P_ID" in it, so I assume this is related to the fact that not all participants are in all datasets, as I get an error saying
error during wrapup: arguments imply differing numbers of rows
when I run the function as:
MyMerge <- function(x, y){
df <- merge(x, y, by = "col.names", all.x = F, all.y = F, sort = TRUE, incomparables = TRUE)
colnames(df) <- df$Col.names
df$Col.names <- NULL
return(df)
}
There is probably a really simple solution out there, but I am relatively new to R, and not that skilled in creating functions etc.
All datasets have different columns, they only share the ID column, so can't do rbind
, maybe cbind
, but because of uneven number of rows I wanted to merge and remove rows that don't match on ID.