1

I am new to pandas and I want to merge two dataframes but I am not sure how to go about doing it properly. So I have a two dataframe df and df1, where df1 is a subset of df. I have performed different operations on df1 and I have new columns and I want to merge it back to df.

   df
   Index col1 col2  col3                  
     1    a    1234  5a
     2    b    3456  c4
     3    c    3613  9y
     4    d    9103  0l
     5    e    8133  9m
     6    f    6790  7h

  df1
  Index col1 col2  col3            
     3    c    3613  9y
     5    e    8133  9m
     6    f    6790  7h

I start off with df and df1 and I apply an operation to df1 which results in:

    df1
    Index col1 col2  col3  col4  col5  
     3    c    3613  9y    qwrt  hkjg
     5    e    8133  9m    asdf  jgnm
     6    f    6790  7h    asdf  sajf

So my question now is how do I merge df1 back into df based on col1 to give something like this (essentially adding nans to the columns that were not part of df1 and preserving the order of the original dataframe df)

   Index col1 col2  col3  col4   col5 
     1    a    1234  5a     nan   nan
     2    b    3456  c4     nan   nan
     3    c    3613  9y     qwrt  hkjg           
     4    d    9103  0l     nan   nan
     5    e    8133  9m     asdf  jgnm
     6    f    6790  7h     asdf  sajf
Alessandrini
  • 181
  • 2
  • 12
  • Where are the values in col4 and col5 coming from. I don't see them in `df1`. What I think you want is `df.merge(df1, on=['col1', 'col2', 'col3'], how='left')` – Erfan Oct 01 '19 at 10:55

4 Answers4

2

IIUC, you need this

df.merge(df1, how='outer')

Output

index   col1    col2    col3    col4    col5
0   1   a   1234    5a  NaN     NaN
1   2   b   3456    c4  NaN     NaN
2   3   c   3613    9y  qwrt    hkjg
3   4   d   9103    0l  NaN     NaN
4   5   e   8133    9m  asdf    jgnm
5   6   f   6790    7h  asdf    sajf
moys
  • 7,747
  • 2
  • 11
  • 42
0

This is all you need to do !!

pd.concat([df,df1], axis=0, ignore_index=True)
Saswath Mishra
  • 309
  • 2
  • 10
0

you can do this, where "col4" and "col5" are columns added after your operations

df = pd.DataFrame([('1', 'a', '1234', '5a'), ('2', 'b', '3456', 'c4'), ('3', 'c', '3613', '9y'), ('4', 'd', '9103', '0l'), ('5', 'e', '8133', '9m'), ('6', 'f', '6790', '7h')], columns=('Index', 'col1', 'col2', 'col3'))
df1 = pd.DataFrame([('3', 'c', '3613', '9y', 'qwrt', 'hkjg'), ('5', 'e', '8133', '9m', 'asdf', 'jgnm'), ('6', 'f', '6790', '7h', 'asdf', 'sajf')], columns=('Index', 'col1', 'col2', 'col3', 'col4', 'col5'))
df = df.set_index("Index")
df1 = df1.set_index("Index")

pd.concat([df,df1[["col4", "col5"]]], axis=1, sort=False)

If you also want to update values of col1 and col2 from df1 to df2 use

df.update(df1)
pd.concat([df,df1[["col4", "col5"]]], axis=1, sort=False)
Dev Khadka
  • 5,142
  • 4
  • 19
  • 33
0

If you use the subset of your dataframe you don't need to merge:

df = pd.DataFrame(np.random.randint(10, size=(4, 2)), columns=list('AB'))
print(df)

Output:

   A  B
0  5  0
1  7  7
2  7  4
3  8  4

You can loc your subset and add a new column:

df['C'] = df.loc[1:2, 'B'] + 1
print(df)

Output:

   A  B    C
0  5  0  NaN
1  7  7  8.0
2  7  4  5.0
3  8  4  NaN
Mykola Zotko
  • 15,583
  • 3
  • 71
  • 73