3

I have the following data frames (in reality they are more than 3).

import pandas as pd
df1 = pd.DataFrame({'head1': ['foo', 'bix', 'bar'],'val': [11, 22, 32]})
df2 = pd.DataFrame({'head2': ['foo', 'xoo', 'bar','qux'],'val': [1, 2, 3,10]})
df3 = pd.DataFrame({'head3': ['xoo', 'bar',],'val': [20, 100]})
# Note that the value in column 'head' is always unique

What I want to do is to merge them based on head column. And whenever the value of a head does not exist in one data frame we would assign it with NA.

In the end it'll look like this:

         head1   head2  head3
 -------------------------------
 foo     11      1       NA
 bix     22      NA      NA
 bar     32      3      100
 xoo     NA      2      20
 qux     NA      10     NA

How can I achieve that using Pandas?

neversaint
  • 60,904
  • 137
  • 310
  • 477
pdubois
  • 7,640
  • 21
  • 70
  • 99

1 Answers1

10

You can use pandas.concat selecting the axis=1 to concatenate your multiple DataFrames.

Note however that I've first set the index of the df1, df2, df3 to use the variables (foo, bar, etc) rather than the default integers.

import pandas as pd

df1 = pd.DataFrame({'head1': ['foo', 'bix', 'bar'],'val': [11, 22, 32]})
df2 = pd.DataFrame({'head2': ['foo', 'xoo', 'bar','qux'],'val': [1, 2, 3,10]})
df3 = pd.DataFrame({'head3': ['xoo', 'bar',],'val': [20, 100]})

df1 = df1.set_index('head1')
df2 = df2.set_index('head2')
df3 = df3.set_index('head3')


df = pd.concat([df1, df2, df3], axis = 1)

columns = ['head1', 'head2', 'head3']
df.columns = columns

print(df)

     head1  head2  head3
bar     32      3    100
bix     22    NaN    NaN
foo     11      1    NaN
qux    NaN     10    NaN
xoo    NaN      2     20
Ffisegydd
  • 51,807
  • 15
  • 147
  • 125