1

I have multiple (more than 2) datasets that I want to link together to create "all data". I want to match the participant ID column (that is in every single dataset) so that there is one row per participant. Previously, I have used merge to link two sets, then continued this, but I have 20+ sets of data I need to merge and this will be too complicated.

I keep getting debugging stuff pop up, or errors related to the fact that there are differing numbers of rows. I want to remove any rows that does not match.

I have tried to create a function like this, based on code from here and other pieces of functions I can find: Merging more than 2 dataframes in R by rownames

MyMerge <- function(x, y){ df <- merge(x, y, by = "P_ID", all.x = F, all.y = F, sort = T, incomparables = NULL) intersect(names(x), names(y)) return(df) }

I have also tried it with by = "col.names" and filter(!x$P_ID %in% !y$P_ID) and changing true and false

Then if I run the code as

All.data <- Reduce(MyMerge, list(df1, df2, df3, df4, df5, df6, by = "P_ID"))

or

All.data <- MyMerge(c(df1, df2, df3, df4, df5, df6, by = "P_ID"))

I get

error during wrapup:'by' must specify a uniquely valid column

but every df has "P_ID" in it, so I assume this is related to the fact that not all participants are in all datasets, as I get an error saying

error during wrapup: arguments imply differing numbers of rows

when I run the function as:

MyMerge <- function(x, y){
  df <- merge(x, y, by = "col.names", all.x = F, all.y = F, sort = TRUE, incomparables = TRUE)
  colnames(df) <- df$Col.names
  df$Col.names <- NULL
  return(df)
}

There is probably a really simple solution out there, but I am relatively new to R, and not that skilled in creating functions etc.

All datasets have different columns, they only share the ID column, so can't do rbind, maybe cbind, but because of uneven number of rows I wanted to merge and remove rows that don't match on ID.

Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
Jess
  • 21
  • 3

0 Answers0