0

Let's say I have these two dataframes:

>>> import pandas as pd

>>> df1 = pd.DataFrame({"key":[1,3,5], "columnA":[30,40,50]})
>>> df1
   key  columnA
0    1       30
1    3       40
2    5       50

>>> df2 = pd.DataFrame({"key":[2,4], "columnB":[60,70]})
>>> df2
   key  columnB
0    2       60
1    4       70

I basically want a new dataframe, with "key", "columnA", and "columnB", where the corresponding data is "interleaved" from the two above dataframes, correspondingly. I did this:

>>> pd.merge(df1, df2, on='key', how='outer').astype('Int64')
   key  columnA  columnB
0    1       30     <NA>
1    3       40     <NA>
2    5       50     <NA>
3    2     <NA>       60
4    4     <NA>       70

... which comes close - but I want the output to be:

   key  columnA  columnB
0    1       30     <NA>
1    2     <NA>       60
2    3       40     <NA>
3    4     <NA>       70
4    5       50     <NA>

How can I achieve that?

sdaau
  • 36,975
  • 46
  • 198
  • 278
  • 1
    Is possible use `pd.merge(df1, df2, on='key', how='outer').astype('Int64').sort_values('key')` ? – jezrael Apr 09 '20 at 07:52
  • Thanks @jezrael - that mostly works, except the resulting index column is [0, 3, 1, 4, 2], - and I want [0, 1, 2, 3, 4] as in example. – sdaau Apr 09 '20 at 07:53

2 Answers2

3

You can use sort_values and then reset_index to achieve the expected output.

In [778]: pd.merge(df1, df2, on='key', how='outer').astype('Int64').sort_values('key').reset_index().drop('index',1)
Out[778]: 
   key  columnA  columnB
0    1       30     <NA>
1    2     <NA>       60
2    3       40     <NA>
3    4     <NA>       70
4    5       50     <NA>

Or you can pass ignore_index=True in the sort_values parameter itself:

In [795]: pd.merge(df1, df2, on='key', how='outer').astype('Int64').sort_values('key', ignore_index=True)
Out[795]: 
   key  columnA  columnB
0    1       30     <NA>
1    2     <NA>       60
2    3       40     <NA>
3    4     <NA>       70
4    5       50     <NA>
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
1

Use DataFrame.sort_values with DataFrame.reset_index and drop=True parameter:

df = (pd.merge(df1, df2, on='key', how='outer')
        .astype('Int64')
        .sort_values('key')
        .reset_index(drop=True))
print (df)
   key  columnA  columnB
0    1       30      NaN
1    2      NaN       60
2    3       40      NaN
3    4      NaN       70
4    5       50      NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252