1

I have a function to deduplicate a data frame so that each person (indexed by PatID) is represented once by the latest record (largest RecID):

dedupit <- function(x) {
        x <- x[order(x$PatID, -x$RecID),]
        x <- x[ !duplicated(x$PatID), ]
        return(x)
        }

It can deduplicate and replace a dataframe if I do:

df <- dedupit(df)

But I have multiple data frames that need deduplication. Rather than write the above code for each individual data frame, I would like to apply a the dedupit function across multiple dataframes at once so that it replaces the unduplicated dataframe with the duplicated version.

I was able to make a list of the dataframes and lapply the function across each element in the list with:

listofdifs <- list(df1, df2, ....)
listofdfs <- lapply(trial, function(x) dedupit(x))

Though, it only modifies the elements of the list and does not replace the unduplicated dataframes. How do I apply this function to modify and replace multiple dataframes?

Arun
  • 116,683
  • 26
  • 284
  • 387
Darkong
  • 15
  • 3
  • 1
    This is the recommended way of handling multiple dataframes. Keeping them in a list is cleaner than filling your global environment with dataframes. – Thomas May 01 '14 at 19:54

1 Answers1

1

Does it work? Name your dataframes when creating the list, so you can recover them afterwards

list.df <- list(df1 = df1, df2 = df2, df3 = df3)

list2env(lapply(list.df, dedupit), .GlobalEnv)

As a result your dataframes df1, df2, df3 will be the deduplicate version.

unlist a list of dataframes

Community
  • 1
  • 1
luis_js
  • 611
  • 5
  • 11