Pandas - Merging a dataframe

Question

I am new to pandas and I want to merge two dataframes but I am not sure how to go about doing it properly. So I have a two dataframe df and df1, where df1 is a subset of df. I have performed different operations on df1 and I have new columns and I want to merge it back to df.

   df
   Index col1 col2  col3                  
     1    a    1234  5a
     2    b    3456  c4
     3    c    3613  9y
     4    d    9103  0l
     5    e    8133  9m
     6    f    6790  7h

  df1
  Index col1 col2  col3            
     3    c    3613  9y
     5    e    8133  9m
     6    f    6790  7h

I start off with df and df1 and I apply an operation to df1 which results in:

    df1
    Index col1 col2  col3  col4  col5  
     3    c    3613  9y    qwrt  hkjg
     5    e    8133  9m    asdf  jgnm
     6    f    6790  7h    asdf  sajf

So my question now is how do I merge df1 back into df based on col1 to give something like this (essentially adding nans to the columns that were not part of df1 and preserving the order of the original dataframe df)

   Index col1 col2  col3  col4   col5 
     1    a    1234  5a     nan   nan
     2    b    3456  c4     nan   nan
     3    c    3613  9y     qwrt  hkjg           
     4    d    9103  0l     nan   nan
     5    e    8133  9m     asdf  jgnm
     6    f    6790  7h     asdf  sajf

Where are the values in col4 and col5 coming from. I don't see them in `df1`. What I think you want is `df.merge(df1, on=['col1', 'col2', 'col3'], how='left')` — Erfan, Oct 01 '19 at 10:55

score 2 · Answer 1 · answered Oct 01 '19 at 11:10

IIUC, you need this

df.merge(df1, how='outer')

Output

index   col1    col2    col3    col4    col5
0   1   a   1234    5a  NaN     NaN
1   2   b   3456    c4  NaN     NaN
2   3   c   3613    9y  qwrt    hkjg
3   4   d   9103    0l  NaN     NaN
4   5   e   8133    9m  asdf    jgnm
5   6   f   6790    7h  asdf    sajf

score 0 · Answer 2 · answered Oct 01 '19 at 10:45

0

This is all you need to do !!

pd.concat([df,df1], axis=0, ignore_index=True)

answered Oct 01 '19 at 10:45

Saswath Mishra

309
2
10

Dev Khadka · Answer 3 · 2019-10-01T11:12:00.890

0

you can do this, where "col4" and "col5" are columns added after your operations

df = pd.DataFrame([('1', 'a', '1234', '5a'), ('2', 'b', '3456', 'c4'), ('3', 'c', '3613', '9y'), ('4', 'd', '9103', '0l'), ('5', 'e', '8133', '9m'), ('6', 'f', '6790', '7h')], columns=('Index', 'col1', 'col2', 'col3'))
df1 = pd.DataFrame([('3', 'c', '3613', '9y', 'qwrt', 'hkjg'), ('5', 'e', '8133', '9m', 'asdf', 'jgnm'), ('6', 'f', '6790', '7h', 'asdf', 'sajf')], columns=('Index', 'col1', 'col2', 'col3', 'col4', 'col5'))
df = df.set_index("Index")
df1 = df1.set_index("Index")

pd.concat([df,df1[["col4", "col5"]]], axis=1, sort=False)

If you also want to update values of col1 and col2 from df1 to df2 use

df.update(df1)
pd.concat([df,df1[["col4", "col5"]]], axis=1, sort=False)

edited Oct 01 '19 at 11:12

answered Oct 01 '19 at 10:54

Dev Khadka

5,142
4
19
33

Did you check the output? – Erfan Oct 01 '19 at 10:56
yes, users "Index" column is actually Index like in example above so it will work. plz try the full example above @Erfan – Dev Khadka Oct 01 '19 at 11:02
Where do you see `col4` and `col5` in df1? – Erfan Oct 01 '19 at 11:11
pease see in user question "I start off with df and df1 and I apply an operation to df1 which results in:" after that df1 has those columns – Dev Khadka Oct 01 '19 at 11:13
No it's what he has as output, which is the incorrect output. – Erfan Oct 01 '19 at 11:39
what I get from above test is different, he has df and df1, after some processing on df1 he gets 2nd df1 which he want to merge back to df. So his questions starts after df1 has "col4" and "col5" – Dev Khadka Oct 01 '19 at 11:43

score 0 · Answer 4 · answered Oct 01 '19 at 11:20

If you use the subset of your dataframe you don't need to merge:

df = pd.DataFrame(np.random.randint(10, size=(4, 2)), columns=list('AB'))
print(df)

Output:

You can loc your subset and add a new column:

df['C'] = df.loc[1:2, 'B'] + 1
print(df)

Output:

   A  B    C
0  5  0  NaN
1  7  7  8.0
2  7  4  5.0
3  8  4  NaN

Pandas - Merging a dataframe

4 Answers4