1

Using a particular function, I wish to merge pairs of data frames, for multiple pairings in an R directory. I am trying to write a ‘for loop’ that will do this job for me, and while related questions such as Merge several data.frames into one data.frame with a loop are helpful, I am struggling to adapt example loops for this particular use.

My data frames end with either “_df1.csv” or ‘_df2.csv”. Each pair, that I wish to merge into an output data frame, has an identical number at the being of the file name (i.e. 543_df1.csv and 543_df2.csv).

I have created a character string for each of the two types of file in my directory using the list.files command as below:

df1files <- list.files(path="~/Desktop/combined files” pattern="*_df1.csv", full.names=T, recursive=FALSE)
df2files <- list.files(path="="~/Desktop/combined files ", pattern="*_df2.csv", full.names=T, recursive=FALSE)

The function and commands that I want to apply in order to merge each pair of data frames are as follows:

findRow <- function(dt, df) { min(which(df$datetime > dt )) }
rows <- sapply(df2$datetime, findRow, df=df1)
merged <- cbind(df2, df1[rows,])

I am now trying to incorporate these commands into a for loop starting with something along the following lines, to prevent me from having to manually merge the pairs:

for(i in 1:length(df2files)){ ……

I am not yet a strong R programmer, and have hit a wall, so any help would be greatly appreciated.

Community
  • 1
  • 1
Emily
  • 859
  • 5
  • 14
  • 31

1 Answers1

1

My intuition (which I haven't had a chance to check) is that you should be able to do something like the following:

# read in the data as two lists of dataframes:
dfs1 <- lapply(df1files, read.csv)
dfs2 <- lapply(df2files, read.csv)

# define your merge commands as a function
merge2 <- function(df1, df2){
    findRow <- function(dt, df) { min(which(df$datetime > dt )) }
    rows <- sapply(df2$datetime, findRow, df=df1)
    merged <- cbind(df2, df1[rows,])
}

# apply that merge command to the list of lists
mergeddfs <- mapply(merge2, dfs1, dfs2, SIMPLIFY=FALSE)

# write results to files
outfilenames <- gsub("df1","merged",df1files)
mapply(function(x,y) write.csv(x,y), mergeddfs, outfilenames)
Thomas
  • 43,637
  • 12
  • 109
  • 140
  • thanks for the above. dfs creates a two-element list matrix with df1 and df2, but the lapply(dfs, merge2) command suggests that the df2 argument is missing. I got the following error message: Error in lapply(X = X, FUN = FUN, ...) : argument "df2" is missing, with no default. Any idea how to solve this? – Emily Jun 27 '13 at 16:09
  • and how would I save the resulting merged data frames from my list matrix as csv files using the write.csv(data, "dataname.csv") command within the coding? – Emily Jun 27 '13 at 16:15
  • thanks but the only way that I could create two lists of dataframes was to rearrange your code to the following: `lapply(df1files, function(x) read.csv)`. This then worked but the mapply command still did not give me correctly merged datasets i.e. all the cells from df1 now read 'NA', and I get a large no. of warnings! – Emily Jun 27 '13 at 18:45
  • in terms of writing the output csv files for each merged pair, I tried adding the following to the merge2 function: `files <- as.character(paste("file", ".csv", sep=""))`followed with `write.csv(merged, files)`. However, I am aware that I am missing a link to the list of original files. Would you mind providing me with a solution? I've tried different things and I can't get it to work. Thanks. – Emily Jun 27 '13 at 18:49
  • @ Thomas thanks for your suggestions but I still get error messages (nas introduced by coercion) and all of the values from df1 read as NAs in the merged dataframes. Brilliant if you have any further suggestions, but otherwise I will go back to the drawing board! – Emily Jun 28 '13 at 09:15
  • Where are you getting the error? I think it might be do to how you're calling your `findRow` function. My code works with some dummy dataframes and a standard `merge` command in `mapply`. – Thomas Jun 28 '13 at 11:18
  • 1
    I have sorted this now. The function doesn't like factors, so `stringsAsFactors=FALSE` needs to added to the command when reading in the lists of data frames. The write.csv command currently overwrites the original files in the directory, but I can change this. Thanks for your help! – Emily Jun 28 '13 at 12:05