Merge of multiple data frames of different number of columns into one big data frame

Question

I have two CSV files with different number of columns and rows. The first CSV file has M columns and N rows, the second has H columns and G rows. Some of the columns have the same name.

I'd like to combine the two into data frame with following properties:

N+G rows
Union of (M, H) columns
if column A is element of first CSV file but not of second, the data frame should contain the same values in first N entries of A as in first CSV, and for the rest (since there is no A data in second CSV) should be NA.

Here is an example:

CSV1
City, Population,
Zagreb, 700000,
Rijeka, 142000

CSV2
City, Area,
Split, 200.00
Osijek, 171.00
Dubrovnik, 143.35

I'd like build a data frame that looks like this:

City      Population   Area
Zagreb    700000       NA
Rijeka    142000       NA
Split     NA           200.00
Osijek    NA           171.00
Dubrovnik NA           143.35

Also what if instead two CSV files I had two data frames and wanted to do the same, for example if I loaded first csv to df1 and second one in df2 and then wanted to make a merge to df3 that would look like example above.

score 4 · Accepted Answer · answered Nov 26 '12 at 22:59

4

Why not try the concat function:

In [25]: df1
Out[25]: 
     City  Population
0  Zagreb      700000
1  Rijeka      142000

In [26]: df2
Out[26]: 
        City    Area
0      Split  200.00
1     Osijek  171.00
2  Dubrovnik  143.35

In [27]: pd.concat([df1,df2])
Out[27]: 
     Area       City  Population
0     NaN     Zagreb      700000
1     NaN     Rijeka      142000
0  200.00      Split         NaN
1  171.00     Osijek         NaN
2  143.35  Dubrovnik         NaN

In [28]: pd.concat([df1,df2], ignore_index=True)
Out[28]: 
     Area       City  Population
0     NaN     Zagreb      700000
1     NaN     Rijeka      142000
2  200.00      Split         NaN
3  171.00     Osijek         NaN
4  143.35  Dubrovnik         NaN

Note: the concat does have some additional options if you have slightly different requirements.

answered Nov 26 '12 at 22:59

Andy Hayden

359,921
101
625
535

@enedene I have a large no. of Columns to concat i dont wanna write name of each of these how can i do this? – Jun 07 '16 at 19:07
@Danish I don't understand the question. please consider asking a separate question (that way more people can see it too). – Andy Hayden Jun 07 '16 at 19:43
yes i can ask a seperate question but this one one is quite simple. I am just asking that lets say I have 200 columns on which i have to concat 2 data frame so rather than writing * pd.concat([mdf1[['user', 'tag1', 'tag2', 'tag3', 'tag4' ]].groupby(['user']).agg(sum) * I can't write name of individually columns so there must be some way for which it takes all the columns after 'tag1' in dataframe. I hope you got my question @AndyHayden – Jun 08 '16 at 04:29
@Danish it takes a list, so you can do a list comprehension or something like that. ps. you can use .sum() rather than .agg(sum) – Andy Hayden Jun 08 '16 at 05:46
@AndyHayden I have already tried putting a list of columns in the argument but it is giving me error. Ideally it shouldn't? and yes thank you for .sum() – Jun 08 '16 at 05:52
@Danish It shouldn't be an error you do df[[col for col in df if col != "foo"]] or whatever. You'll definitely get a better/faster response and from more people if you post a question (it can also have the full problem/error etc.) :) – Andy Hayden Jun 08 '16 at 16:19

Merge of multiple data frames of different number of columns into one big data frame

1 Answers1

Linked