1

I am facing a very small issue with R. However, prior research on the forum have not yielded any positive findings.

Specifically, I have a list of authors and their publications, as well as further administrative data relating to their publications. The unique column matching the administrative data to their publication is found in the 2nd column of both datasets.

As such I have written the following code below to match each author's unique administrative data to their publications:

for (file in file_list) {
XX <- read.csv(paste(file,"_Dets.csv",sep=""))
YY <- read.csv(paste(file,"_Cits.csv",sep=""))
file <- merge(XX, YY, by = 2:2, all = F)
  }

Unfortunately, instead of producing N number of outputs, I find that "file", in this case, is repeatedly overwritten in each loop. How do I fix this?

I am using the latest version of R on a Mac.

The file_list looks like this:

[1] "Weils_Raymond"
[2] "Lucas_George"
...
[30] "Clinton_Peel"
grievy
  • 25
  • 4

1 Answers1

0

We can use Map for this purpose as it will go through each corresponding elements and then do the merge. It would have been better if the OP showed what is in the file_list.

 Map(function(x,y) merge(x,y, by.x = names(x)[2], by.y=names(y)[2]), 
   lapply(paste0(files_list, "_Dets.csv"), read.csv, 
                   stringsAsFactors=FALSE),
   lapply(paste0(files_list, "_Cits.csv"), read.csv, 
                   stringsAsFactors=FALSE))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • The `file_list` contains author names which I am uncomfortable disclosing on a public site. Regardless, `file_list` refers to a vector of character strings of varying lengths depending on how long the author's name is. – grievy Jan 18 '16 at 05:49
  • I just gave your code a try but it returns the error: `Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column` – grievy Jan 18 '16 at 05:57
  • @grievy You don't need to provide the exact details in the `file_list`, instead show a small example that mimics the structure of your data – akrun Jan 18 '16 at 06:14
  • @grievy I updated with `by.x` and `by.y` (in case the column names are different in both groups of datasets. – akrun Jan 18 '16 at 06:15
  • @grievy Look, we cannot test the code without a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), Based on the error mentioned, probably the updated code may work – akrun Jan 18 '16 at 06:18
  • I have added an example of how `file_list` looks like in my original question above. – grievy Jan 18 '16 at 06:24
  • Just tried, it returns a huge list whereas I was hoping for say 30 different dataframes belonging to different authors. How shall I go about obtaining that outcome? – grievy Jan 18 '16 at 06:30
  • @grievy How many elements are there in the `file_list`? What do you mean by `huge list`? The `Map` output is a `list` of `data.frames`. So, if there are 30 elements in `file_list`, the result would be `30` data.frames in the `list` – akrun Jan 18 '16 at 06:33
  • Ah I understand, thank you! I really appreciate the help. I initially assumed that Map was going to give me 30 dataframes separately rather than putting it all in a list. – grievy Jan 18 '16 at 06:39
  • @grievy If you need 30 data.frames, it can be done also, (but that is unnecessarily creating objects in the global environment). For example suppose the `res <- Map(....)`, then `list2env(setNames(res, paste0("dfNew_", seq_along(res))), envir=.GlobalEnv)` – akrun Jan 18 '16 at 06:41
  • I understand what you mean. Part of this is because I would still like to identify the dataframes by their authors. This is because I have not created a unique author identifier within the dataframes. – grievy Jan 18 '16 at 06:51
  • @grievy In that case change the `paste0` with the unique `author` vector and then create that object in the global environment. – akrun Jan 18 '16 at 06:54