0

There are two pandas dataframes I have which I would like to combine with checking of two conditionals.

Dataframe1:

import pandas as pd 
data = [['Z085', '2020-08', 1.33], ['Z086', '2020-08', 1.83], ['Z086', '2020-09', 1.39]] 
df1 = pd.DataFrame(data, columns = ['SN', 'Date', 'Value']) 

enter image description here

Dataframe2:

data = [['Z085', '2020-08', 0.34], ['Z085', '2020-09', 0.83], ['Z086', '2020-09', 0.29]] 
df2 = pd.DataFrame(data, columns = ['SN', 'Date', 'ValueX']) 
df2 

enter image description here

I would like to merge or append or join them in order to get the folowing dataframe: The values ("Value" and "ValueX") are being add if both "SN" and "Date" are equal.

enter image description here

I am not sure, if a new dataframe is required or to map the df2 to the df1.

This is what i have tried:

df1['ValueX'] = df1[('Date', 'SN')].map(df2_mean.set_index('Date', 'SN')['ValueX'])

With one conditional (for example: Date) it works ok, but i am not able to set up two conditionals.

Essegn
  • 153
  • 1
  • 9
  • This is simply a `merge()` operation. Don't call the columns "conditionals", just say "merge on the columns SN, Date". Read: [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – smci Oct 09 '20 at 18:44
  • In your example output, the sort-order is different to what you get with `merge(..., sort=False)` Does it matter? What sort-order (if any) do you want on your output: `['SN','Date']` don't-care, or what? Please edit into your question. In general, we should avoid unnecessary sorting by key(s) on large data, it's wasteful and non-scaleable. – smci Oct 22 '20 at 00:27

2 Answers2

1

This is simply a merge() operation. Don't call the columns "conditionals", just say "merge on the columns SN, Date".

However pandas (v1.1.4) has a bug (its default is to use reversed i.e. 'ascending') key order when doing the sort) so you can't rely on it; note below it gets sorted by 'Date' then 'SN', i.e. wrong-way-around:

>>> dfnew_bad = df1.merge(df2, on=['SN','Date'], how='outer')

     SN     Date  Value  ValueX
0  Z085  2020-08   1.33    0.34
1  Z086  2020-08   1.83     NaN
2  Z086  2020-09   1.39    0.29
3  Z085  2020-09    NaN    0.83

So in your case to get the correct order by SN then Date:

dfnew_good = df1.merge(df2, on=['SN','Date'], how='outer', sort=False).sort_values(['SN', 'Date'])
     SN     Date  Value  ValueX
0  Z085  2020-08   1.33    0.34
3  Z085  2020-09    NaN    0.83
1  Z086  2020-08   1.83     NaN
2  Z086  2020-09   1.39    0.29

Note that there's a flag .sort_values(ascending=True) but not pd.merge() You could also workaround by doing pd.merge(..., sort=False) then dfnew_workaround.sort_index(..., inplace=True)

smci
  • 32,567
  • 20
  • 113
  • 146
  • I am a bit confused by second part of your answer. `sort=False` followed by `sort_values`, why not just `sort=True`. – Grayrigel Oct 18 '20 at 07:47
  • @Grayrigel: because pandas 1.1.x still has a bug and doesn't do the sort right when the key is compound: it sorts on `Date` then `SN`, i.e. it reverses the key order. – smci Dec 15 '20 at 05:52
1

Method 1: merge:

df_new = df1.merge(df2, on=['SN','Date'],how='outer', sort=True)
print(df_new)

Method 2 : join:

df_new = df1.join(df2.set_index(['SN','Date']), on=['SN','Date'],how='outer', sort=True)
print(df_new)

In this case, one more possible way would be to use pd.concat:

df_new = pd.concat([df1.set_index(['SN','Date']),df2.set_index(['SN','Date'])],axis=1).reset_index()

Output in either case:

     SN     Date  Value  ValueX
0  Z085  2020-08   1.33    0.34
3  Z085  2020-09    NaN    0.83
1  Z086  2020-08   1.83     NaN
2  Z086  2020-09   1.39    0.29
Grayrigel
  • 3,474
  • 5
  • 14
  • 32
  • **You don't need the `.sort_values('SN')`** at the end, just `.merge(..., sort=False)` per my answer. As long as both input dataframes were already sorted in increasing SN, no need to scramble them then re-sort them; just inefficient. – smci Oct 18 '20 at 05:07
  • @smci Thanks for the input. However, I added `.sort_values('SN')` to match the output format of original post. Nothing else! Also, `sort=False` is the default behavior. `sort=False` doesn't give the same output as the OP wants. I think you meant `sort=True`. In any case, I have updated my answer. **Can I get my downvote removed please?** – Grayrigel Oct 18 '20 at 07:54
  • Grayrigel: hmm, it's unclear if the OP cares about sort-order and if so what order they want to impose. Let's wait for them to respond. I didnt realize that was the intent of your multiple alternatives, suggest you add that as a comment. – smci Oct 22 '20 at 00:30