2

I am attempting to merge many (~40) dataframes (i.e. pH, fieldpH, AlkT, …) into one dataframe based on a unique identifier (row ID) “SampleID”, however, when I attempt to merge these files using the script below the file reaches 75MB and crashes. The final, combined file will be large, but 75MB seems a little extreme.

I’m not sure if I am compounding the files and accidentally making the file size larger?

Any help or insight as to how to fix this problem or a suggestion of a better way to merge numerous dataframes based on row ID would be greatly appreciated!

Disclaimer: that I am definitely a beginner with R and coding in general.

## Subset original file ##
pH<-subset(data, VARIABLE_TRIM == "PH")
pH<-pH[c("SampleID", "MONTH", "YEAR", "FLAG", "VALUE_CONV")]
names(pH)[5]<-"pH"
fieldpH<-subset(data, VARIABLE_TRIM == "FIELD PH")
fieldpH<-fieldpH[c("SampleID", "FLAG", "VALUE_CONV")]
names(fieldpH)[3]<-"Field pH"

## Merge dataframes ##
fulldata <- Reduce(function(x, y) merge(x, y, by = "SampleID",
                                        all.x = TRUE, all.y = TRUE),
                   list(pH, fieldpH, AlkT, Hard, Hard2, DO, DOC, TOC, Spec, 
                        SpecField, TempField, Temp, SbT, SbD, AsT, AsD, CdT,
                        CdD, CrT))
zx8754
  • 52,746
  • 12
  • 114
  • 209
cbmiller
  • 21
  • 1
  • definitely check out the join_all() functoin from the plyr package. – vanao veneri Feb 28 '19 at 15:07
  • 1
    Possible duplicate of [How to join (merge) data frames (inner, outer, left, right)?](https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right) – divibisan Feb 28 '19 at 15:12
  • This duplicate gives you many different options for joining/merging dataframes – divibisan Feb 28 '19 at 15:13
  • @divibisan I don't think this is a duplicate, related post is this one [merge list of dataframes](https://stackoverflow.com/questions/8091303). And the issue is their implementation is not giving expected output. – zx8754 Feb 28 '19 at 15:26
  • Maybe. From the question as it's stated, there doesn't seem to be anything particularly unusual here, they're just having problems getting `merge` to work, in which case, barring any clarifying edits, directing them to the canonical question on joining data frames is the best thing we can do. Your duplicate is definitely a better choice, though – divibisan Feb 28 '19 at 15:59
  • This may be a problem of going from long to wide format. In which case other options other than merge are better. If we could get a sample of the data for 1 or 2 sample ids make understanding the problem easier. – Dave2e Feb 28 '19 at 16:39
  • The plyr package worked fantastically! Thank you all! Apologies for the delay, I'm still not sure how to indicate that one of the comments provided the solution. – cbmiller Mar 10 '19 at 14:50

0 Answers0