1

So I have this big list of dataframes, and some of them have matching columns and others do not. I want to rbind the ones with matching columns and merge the others that do not have matching columns (based on variables Year, Country). However, I don't want to go through all of the dataframes by hand to see which ones have matching columns and which do not.

Now I was thinking that it would look something along the lines of this:

myfiles = list.files(pattern="*.dta")
dflist <- lapply(myfiles, read.dta13)

for (i in 1:length(dflist)){

  if colnames match
    put them in list and rbindlist.
  else put them in another list and merge.
}    

Apart from not knowing how to do this in R exactly, I'm starting to think this wouldn't work after all.

To illustrate consider 6 dataframes:

Dataframe 1:                          Dataframe 2:

Country Sector Emp              Country   Sector Emp
Belg      A     35                NL        B     31
Aus       B     12                CH        D     45
Eng       E     18                RU        D     12

Dataframe 3:                      Dataframe 4:
Country Flow    PE               Country  Flow PE   
NL        6     13                 ...    ...   ... 
HU        4     11                 ...    ...
LU        3     21                 ...

Dataframe 5:              dataframe 6:

Country Year Exp          Country Year Imp 
 GER     02   44           BE      00   34
 GER     03   34           BE      01   23
 GER     04   21           BE      02   41 

In this case I would want to rbind (dataframe 1,dataframe2) and rbind(dataframe 3, dataframe 4), and I would like to merge dataframe 5 and 6, based on variables country and year. So my output would be several rbinded/merged dataframes..

Oscar
  • 41
  • 2
  • 9
  • I think merge (with all=TRUE) will rbind (though more slowly), so it might work to just merge them all together. For ways to do that, http://stackoverflow.com/questions/8091303/simultaneously-merge-multiple-data-frames-in-a-list – Aaron left Stack Overflow Jul 14 '16 at 23:55
  • any update for this? did you find a way to solve this? – Stataq Nov 23 '20 at 06:32

2 Answers2

0

Rbind will fail if the columns are not the same. As suggested you can use merge or left_join from the dplyr package.

Maybe this will work: do.call(left_join, dflist)

Ulrik
  • 1,575
  • 2
  • 10
  • 10
0

For same columns data frame you could Union or Union all operation. union will remove all duplicate values and if you need duplicate entries, use Union all. (For data frame 1 and data frame 2) & (For data frame 3 and data frame 4) use Union or Union all operation. For data frame 5 and data frame 6, use

merge(x= dataframe5, y=dataframe6, by=c("Country", "Year"), all=TRUE)