-1

I'm trying to pass in a list of file names and merge the corresponding data frames. I build the list of file names by reading all the files in the directory. My problem is not with merging the data frames, it's that the data frame list is being interpreted as character strings.

temp.table <- merge(all.data.frames[1], all.data.frames[2], by = "hostname", all = TRUE)

Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column

Here is my code that reads the directory to list all the data frames and I'm trying to merge them all on a common variable:

setwd("C:/temp")
all.files <- ls()
all.data.frames <- all.files[sapply(all.files, function(x) is.data.frame(get(x)))]

This makes a list equivalent to:

c("data.frame1.dt", "data.frame2.dt", "data.frame3.dt", ...)

Here is some test data:

data.frame1.dt <- data.frame("hostname" = c("a", "b", "c"), "username" = c("dkfj", "adfkjds", "adklfh"), "tele" = c(12,43,56))

data.frame2.dt <- data.frame("hostname" = c("d", "b", "q"), "username" = c("dkfjdfs", "adfdtkjds", "adasdklfh"), "scan" = c(132,403,546))

data.frame3.dt <- data.frame("hostname" = c("a", "q", "x", "Ip1"), "timer" = c("Log1", "Log3", "Log5", "LAG"), "cpu" = c("1343232-a","4354342-03", "5dfasd46", "Nothing"))

The merging of the data works like this:

xx <- merge(data.frame1.dt, data.frame2.dt, by = "hostname", all = TRUE)

total.join <- merge(xx, data.frame3.dt, by = "hostname", all = TRUE)

Giving the result:

  hostname username.x tele username.y scan timer        cpu
1        a       dkfj   12       <NA> <NA>  Log1  1343232-a
2        b    adfkjds   43  adfdtkjds  403  <NA>       <NA>
3        c     adklfh   56       <NA> <NA>  <NA>       <NA>
4        d       <NA> <NA>    dkfjdfs  132  <NA>       <NA>
5        q       <NA> <NA>  adasdklfh  546  Log3 4354342-03
6      Ip1       <NA> <NA>       <NA> <NA>   LAG    Nothing
7        x       <NA> <NA>       <NA> <NA>  Log5   5dfasd46

How can I do this merge on the entire directory of data.frame files? Right now they are being interpreted as character strings.

jrzelling
  • 405
  • 2
  • 17
  • the problem isn't how to merge, it's that the file list is being interpreted as a list of character strings because of the way I'm reading the files in the directory. – jrzelling Aug 15 '16 at 20:00
  • 1
    First word in your title is *merge*. Take it as 2 steps: 1st, read all files into a list object, see [this post](http://stackoverflow.com/questions/11433432/importing-multiple-csv-files-into-r), then use above post for the 2nd step. – zx8754 Aug 15 '16 at 20:03
  • 1
    They are being treated as character strings because that's exactly what `all.data.frames` contains: a character vector of the names of the objects. If you want to turn that vector of names to a list of those data.frames, try `mget(all.data.frames)`. Then you can follow all the other duplicate questions about merging a list of data.frames. – MrFlick Aug 15 '16 at 20:44

2 Answers2

1

It is easier to apply the merge or join function to a list recursively. First read the files from the directory into a list.

 filenames <- list.files(path = ".", pattern="*.csv", full.names=TRUE)
 list_of_df <- lapply(filenames, read.csv)

Then call the join function on your list of data frames.

do.call("join", list_of_df)

On some sample data

do.call("join",list(data.frame(hostname = 1:10, x = rnorm(10)), 
                    data.frame(hostname = 8:12, y = rnorm(5))))
shayaa
  • 2,787
  • 13
  • 19
  • I don't even have csv files – jrzelling Aug 15 '16 at 19:45
  • what extension is e.g., `data.frame2.dt`, perhaps `.dt`? – shayaa Aug 15 '16 at 19:49
  • there are 43 .dt files. I don't understand if I'm supposed to pass in a list of file names or a list of actual data frames. I'm trying to pass in a list of filenames. – jrzelling Aug 15 '16 at 19:56
  • You can do either. If you have a list of data frames you can just use the last snippet, otherwise you need to create the list of data frames from the directory by passing a list of file names to the function which reads those file extensions using `lapply`. Not sure which function reads .dt extension. – shayaa Aug 15 '16 at 20:08
1

You haven't created data frames properly. Change from cbind() to data.frame() and you will get the desired result.

typeof(data.frame1.dt) will give you an idea of what mistake you are making.

data.frame1.dt <- data.frame(hostname = c("a", "b", "c"), username = c("dkfj", "adfkjds", "adklfh"), tele = c(12,43,56))

data.frame2.dt <- data.frame(hostname = c("d", "b", "q"), username = c("dkfjdfs", "adfdtkjds", "adasdklfh"), scan = c(132,403,546))

data.frame3.dt <- data.frame(hostname = c("a", "q", "x", "Ip1"), timer = c("Log1", "Log3", "Log5", "LAG"), cpu = c("1343232-a","4354342-03", "5dfasd46", "Nothing"))


xx <- merge(data.frame1.dt, data.frame2.dt, by = "hostname", all = TRUE)

total.join <- merge(xx, data.frame3.dt, by = "hostname", all = TRUE)
Pj_
  • 824
  • 6
  • 15
  • that is just test data. I build the character list by reading the file names from the directory, the data frames are already created. However I will update the question, thanks. – jrzelling Aug 15 '16 at 19:58