3

I've been looking through the documentation and Pandas Merging 101, but help was nowhere to be found.

Suppose I have two Dataframes as follows

>>> df

    Values  Count  Percentage
0    Apple      0           0
1   Banana      0           0
2  Samsung      0           0
3   Orange      0           0
>>> df2

    Values  Count  Percentage
0    Apple     14        0.74
1  Samsung      5        0.26

And I want to merge the two Dataframes to produce the following result

>>> result

    Values  Count  Percentage
0    Apple     14        0.74
1   Banana      0           0
2  Samsung      5        0.26
3   Orange      0           0

Keep in mind that df2 is always a subset of df.
In other words, every Values in df2 will be guaranteed to present in the Values of df.

Here is what I have so far

result = df.merge(df2, on='Values', how='left')
>>> result

    Values  Count_x  Percentage_x  Count_y  Percentage_y
0    Apple        0             0     14.0          0.74
1   Banana        0             0      NaN           NaN
2  Samsung        0             0      5.0          0.26
3   Orange        0             0      NaN           NaN

But the result is kind of disappointing.
Any help would be greatly appreciated.

  • Does this answer your question? [Python Pandas update a dataframe value from another dataframe](https://stackoverflow.com/questions/49928463/python-pandas-update-a-dataframe-value-from-another-dataframe) – DeGo May 09 '21 at 09:04

3 Answers3

2

Or you can use update. After setting the index, just update the value in df1 with new values:

df1 = df1.set_index('Values')
df1 = df1.update(df2.set_index('Values')).reset_index()
Nk03
  • 14,699
  • 2
  • 8
  • 22
1

I think what you want is a pd.concat followed by a .drop_duplicates()

>>> df1
    Values  Count  Percentage
0    Apple      0           0
1   Banana      0           0
2  Samsung      0           0
3   Orange      0           0
>>> df2
    Values  Count  Percentage
0    Apple     14        0.74
1  Samsung      5        0.26
>>> pd.concat([df1, df2]).drop_duplicates(subset='Values', keep='last')
    Values  Count  Percentage
1   Banana      0        0.00
3   Orange      0        0.00
0    Apple     14        0.74
1  Samsung      5        0.26
Gusti Adli
  • 1,225
  • 4
  • 13
1

You should:

  • set the index in df to Values and save the result in a temporary DataFrame,
  • update it with df2 with the index set also to Values (update is performed in place),
  • reset the index to convert Values to a regular column,
  • maybe save this result back under df.

The code to do it is:

wrk = df.set_index('Values')
wrk.update(df2.set_index('Values'))
df = wrk.reset_index()

The result is:

    Values  Count  Percentage
0    Apple   14.0        0.74
1   Banana    0.0        0.00
2  Samsung    5.0        0.26
3   Orange    0.0        0.00
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41