1

I have 2 datasets:

Data1:

Var1 Var2   Var3    Var4
10    10      2   3
9      2      8   3
6      4      4   8
7      3     10   8

Data2:

Var1 Var5   Var3    Var6
  3    6      6   4
  1    2      5   1
  9    2      2   9
  2    6      3   2

Now I want to append this 2 datasets

Final Data:

Var1  Var2    Var3  Var4  Var5 Var6
10      10       2     3        
9        2       8     3        
6        4       4     8        
7        3      10     8        
3                      4     6    6
1                      1     2    5
9                      9     2    2
2                      2     6    3

I can't use rbind to create this dataset. Can anybody please tell me the method to create this dataset? Also, suppose I want to append multiple (more than 2) datasets. What's the procedure?

sebastian-c
  • 15,057
  • 3
  • 47
  • 93
Beta
  • 1,638
  • 5
  • 33
  • 67
  • type ?merge at the R prompt. Also read this http://stackoverflow.com/questions/1299871/how-to-join-data-frames-in-r-inner-outer-left-right – Yoda Sep 23 '12 at 13:15
  • Will **merge** gives me dataset 1 below another? I want to stack the dataset. – Beta Sep 23 '12 at 13:17
  • Yes, just play around with merge(x=Data1, y=Data2 by.x='Var1'by.y='Var1',all=TRUE) or something very similar. – Yoda Sep 23 '12 at 13:28
  • I tried to fix your formatting, but now I'm confused. Is that what you want your "Final Data" to look like? – GSee Sep 23 '12 at 13:39
  • This is exactly how my final data look like. Thank you GSee! Actually in SAS, we either use "SET" or "PROC APPEND" statement to stack dataset. So I was confused with merge here. But the second question still remains. How to do it for more than 2 datasets? – Beta Sep 23 '12 at 13:59
  • @user697363 That depends on how you have your data. If it's in a list, then [this](http://stackoverflow.com/questions/8091303/merge-multiple-data-frames-in-a-list-simultaneously) might be helpful. – sebastian-c Sep 23 '12 at 14:05

4 Answers4

7

I recommend the function rbind.fill of the plyr package:

library(plyr)
rbind.fill(Data1, Data2)

#  Var1 Var2 Var3 Var4 Var5 Var6
#1   10   10    2    3   NA   NA
#2    9    2    8    3   NA   NA
#3    6    4    4    8   NA   NA
#4    7    3   10    8   NA   NA
#5    3   NA    6   NA    6    4
#6    1   NA    5   NA    2    1
#7    9   NA    2   NA    2    9
#8    2   NA    3   NA    6    2

The major advantage of this technique is that it's not limited to two data frames, but allows combining any number of data frames.

If the data still needs to be read from disk, you can do something like:

file_list = list.files()
data_list = lapply(file_list, read.table)
data_combined = do.call("rbind.fill", data_list)
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
  • As I mentioned in the comment below I've already tried it. But for some reason it's not working. Maybe some silly error I'm doing. But thanks for your answer. – Beta Sep 23 '12 at 17:34
  • What is the output of `str(Data1)`? – Sven Hohenstein Sep 23 '12 at 17:37
  • The dataset I'm using str(mydata) is giving this result "data.frame': 792 obs. of 16 variables". Also, the character variables it's taking as Factor. "Data1" is just a cooked up example. – Beta Sep 23 '12 at 17:43
  • @user697363 What error message is displayed when you use `rbind.fill`? Does it work if you combine the data frame with itself? – Sven Hohenstein Sep 23 '12 at 17:45
  • It's not showing any error. Just that dataset are not combining the way I want to. All the variable names are not there in the updated datasets. Also there are many missing data. So maybe because the dataset I've created are not properly done. I'll work through it again. As you have suggested rbind.fill should work. I might be making some mistake somewhere. – Beta Sep 23 '12 at 17:49
  • @SvenHohenstein I suspect your solution here is the best as it won't remove duplicate entries as a merge solution would. – sebastian-c Sep 25 '12 at 00:44
  • Hi Sven, the rfill worked. But Satish solns is better. But I've upvoted your response too. – Beta Sep 25 '12 at 17:51
  • 1
    If you get rbind.fill working that would be a way better solution than Satish's. If your problem is that you need to read the datasets first, I've edited in some code to do that. – Paul Hiemstra Sep 25 '12 at 18:27
5
merge(Data1, Data2, all=TRUE, sort=FALSE)

  Var1 Var3 Var2 Var4 Var5 Var6
1   10    2   10    3   NA   NA
2    9    8    2    3   NA   NA
3    6    4    4    8   NA   NA
4    7   10    3    8   NA   NA
5    3    6   NA   NA    6    4
6    1    5   NA   NA    2    1
7    9    2   NA   NA    2    9
8    2    3   NA   NA    6    2

EDIT: A way to combine multiple frames As detailed here.

Combining more than 2 frames

Data3

  Var1 Var3 Var5 Var6
1    2    6    4    1
2   10    1    6    1
3    1    6    3    1
4    9    5    5    7

We'll need to put your data into a list and use a nice package called reshape.

datalist <- list(Data1, Data2, Data3)
library(reshape)

merge_recurse(datalist)
   Var1 Var3 Var2 Var4 Var5 Var6
1    10    2   10    3   NA   NA
2     9    8    2    3   NA   NA
3     6    4    4    8   NA   NA
4     7   10    3    8   NA   NA
5     3    6   NA   NA    6    4
6     1    5   NA   NA    2    1
7     9    2   NA   NA    2    9
8     2    3   NA   NA    6    2
9     2    6   NA   NA    4    1
10   10    1   NA   NA    6    1
11    1    6   NA   NA    3    1
12    9    5   NA   NA    5    7
Community
  • 1
  • 1
sebastian-c
  • 15,057
  • 3
  • 47
  • 93
1
# Open a new directory and keep only the data files to be combined
combinedfiles <- function(){
  # nullVar: Creating a Null Variable using as.null function
    nullVar <- function(x){ 
    x <- getwd(); 
    x <- as.null(x); 
    }

  # readTab: Read file using read.table function
    readTab <- function(y) { 
    read.table(y, header=TRUE, sep = " ") 
    }

    objectcontent <- nullVar(x);    

    for (i in 1:length(list.files(getwd()))) {
    y <- list.files(getwd())[i];
    objectcontent <- rbind(objectcontent, readTab(y));
    i <- i + 1
    }
  return(objectcontent)
}

#Then type the following in the console
  combinedfiles()

a version using apply loops (which do not suffer from the rbind slowdown):

combined_files = function(file_path, extension = "csv") {
   require(plyr)
   file_list = list.files(file_path, pattern = extension)
   data_list = lapply(file_list, read.table, header = TRUE, sep = " ")
   combined_data = do.call("rbind.fill", data_list)
   return(combined_data)
 }
Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Sathish
  • 12,453
  • 3
  • 41
  • 59
  • Thanks Sathish! You could have edited your previous answer & write it there. This is far better solution than previous ones. – Beta Sep 25 '12 at 17:50
  • 2
    IMHO, this is an _awful_ solution. reading data from disk, looping over files in `getwd()` and `rbind`ing as you go. It's a good example of how not to use **R** – GSee Sep 25 '12 at 17:57
  • Hi GSee, Thanks for your encouraging comment. Why not you edit the code and make it better, so that I can also learn from you how to write a better code. However, mine worked for the question poster. Thanks :) – Sathish Sep 25 '12 at 18:09
  • @Sathish, I _think_ that some version of `do.call(rbind, LIST)` is what the OP is looking for, but the OP has not provided a very good reproducible example. It's not clear what the input Data objects are or what the output should be. Notice in the question that `Var3` and `Var6` of Data2 became `Var6` and `Var4`, respectively, in the output which the OP never explained. – GSee Sep 25 '12 at 18:20
  • @GSee it looks like the OP needs `rbind.fill`. @Satish rbinding like this can get veeeeeeeeeeery slow (order of several thousands of times slower than an effective solution) when the number of files grows larger. – Paul Hiemstra Sep 25 '12 at 18:25
  • @Satish, also it looks like your solution just rbinds the datasets and performs no merge... – Paul Hiemstra Sep 25 '12 at 18:26
  • @Satish I edited in a solution using `lapply` and `rbind.fill`. In addition, it does not need the files to be in `getwd`, but the user can define any path. Also, the user can select which file extension is read, e.g. `csv`. – Paul Hiemstra Sep 25 '12 at 18:33
  • @GSee: Thanks for the constructive comment. – Sathish Sep 25 '12 at 19:08
  • @Paul: Thanks for the edit and constructive comment. I did check your code with many number of files and it is much faster than the one I posted here – Sathish Sep 25 '12 at 19:09
  • @PaulHiemstra: Please add -- library("plyr") -- in your edit. Thanks – Sathish Sep 25 '12 at 19:30
  • I'm not downvoting this answer, because I'm a wuss, but I'm strongly tempted to, because I really think that people should be reading @PaulHiemstra's answer rather than this one. I can appreciate that Satish worked hard on his answer but it's definitely poorer R code ... – Ben Bolker Sep 25 '12 at 21:53
0

Try this:

data1 <- as.data.frame(read.table("data1", header=TRUE, sep=" "))
data2 <- as.data.frame(read.table("data2", header=TRUE, sep=" "))
merge(data1, data2, all=TRUE, all.x=TRUE, all.Y=TRUE)
Sathish
  • 12,453
  • 3
  • 41
  • 59
  • Thanks Sathish! Your code is similar to Sebastian's, and this code works. But I also want to combine multiple datasets. And this code doesn't works. Sebastian gave me a clue in the above comment. But somehow it's not working for me. – Beta Sep 23 '12 at 14:33
  • well, I tried to save data files in the working directory, so read the files from your working directory and it will work. I think, the problem may be due to the wrong usage of file name as per my code – Sathish Sep 23 '12 at 15:27
  • Thanks Sathish for continuous support! But for 2 dataset Data1 & Data2, your & Sebastian's code work perfectly fine. The problem is if I've more than 2 datasets, say 5 datasets. Then to do the merging I 1st have to merge Data1 & Data2, create NewData. Then again merge Newdata with Data3, & so on. I'm looking for a method that can merge multiple datasets. rbind.fill for some reason doesn't work for me. Also, the "list" example as suggested by Sebastian. – Beta Sep 23 '12 at 15:32