-1

I would like to merge files based on columns. The files does not have the similar number of rows. The output should contains all rows and the count should be 0 if its not present in a certain file.

I tries something like:

 file_list <- list.files(pattern = "*.mature")

    > dataset_tumor <- do.call("cbind",lapply(file_list,
+ FUN=function(files){read.table(files,
+ header=TRUE, sep="")}))
Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 497, 642, 692, 694, 699, 515, 707, 740, 605, 568, 602, 512, 624, 634, 551, 662, 750, 442, 615, 557, 466, 638, 560, 576, 851, 705, 614, 547, 670, 752, 586, 671, 754, 603, 666, 587, 601, 572, 550, 573, 621, 650, 701, 622, 735, 434, 742, 737, 809, 661, 540, 645, 722, 594, 681, 659, 781, 613, 641, 756, 595, 966, 658, 539, 520, 619, 564, 732, 679, 596, 536, 518, 631, 691, 708, 625, 630, 589, 639, 538


> head(a.mature)
                 X4
hsa-let-7a-5p 12342
hsa-let-7b-3p    27
hsa-let-7b-5p 47413
hsa-let-7c-5p  2825
hsa-let-7d-3p  1162
hsa-let-7d-5p   219
> head(b.mature)
                X15
hsa-let-7a-5p 28868
hsa-let-7b-3p    41
hsa-let-7b-5p 62259
hsa-let-7c-5p  4468
hsa-let-7k-3p  2027
hsa-let-7f-5p   938

out

               X4        X15
hsa-let-7a-5p  12342      28868
hsa-let-7b-3p  27         41
hsa-let-7b-5p  47413      62259
hsa-let-7c-5p  2825       4468
hsa-let-7d-3p  1162       0
hsa-let-7d-5p  219        0
hsa-let-7k-3p  0          2027
hsa-let-7f-5p  0          938
user2300940
  • 2,355
  • 1
  • 22
  • 35

1 Answers1

0

Like in databases with primary key and foreign key, you need a common column between both datasets to combine two datasets. From the example of merge function

authors <- data.frame(
    surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
    nationality = c("US", "Australia", "US", "UK", "Australia"),
    deceased = c("yes", rep("no", 4)))

 books <- data.frame(
    name = I(c("Tukey", "Venables", "Tierney",
               "Ripley", "Ripley", "McNeil", "R Core")),
   title = c("Exploratory Data Analysis",
             "Modern Applied Statistics ...",
             "LISP-STAT",
             "Spatial Statistics", "Stochastic Simulation",
             "Interactive Data Analysis",
             "An Introduction to R"),
              other.author = c(NA, "Ripley", NA, NA, NA, NA,
              "Venables & Smith"))

Here we have two dataframes and we have surname column in authors is the same as name column in books dataframe. Therefore, we can use those fields to merge the datasets using:

m1 <- merge(authors, books, by.x = "surname", by.y = "name")

If you want to keep all the books in the combined dataframe, you can use all.y or all.x parameters in merge function, whichever you keep first.

  m1 <- merge(authors, books, by.x = "surname", by.y = "name", all.y =TRUE)

OR

 m1 <- merge(books, authors,  by.x = "name",  by.y = "surname", all.x =TRUE)

Similarly, you can also use join_all function in plyr package, which can merge more than two files.

discipulus
  • 2,665
  • 3
  • 34
  • 51
  • Join_all seems to work, however, how do I include the rows that are not found in all files? I need to include them as NA – user2300940 Feb 15 '16 at 14:19
  • @user2300940 : see this answer http://stackoverflow.com/a/21438584/476907 ... there is the detail of how you can do it. – discipulus Feb 15 '16 at 18:46