Python pandas Grouping in a different way

Question

I have many dataframes read from pdf files. And they look like this:

e.g order 1 - this is a dataframe:
code    description   price   quantity       
000001  product A       1       10
000002  product B       2       20
000003  product C       3       30 
...

order 2 - this is a dataframe:
code    description   price   quantity
000001  product A       1       100
000002  product B       2       20
000004  product D       4       40

There will be orders 3, 4 etc I like to join them up and group by the code(which is unique) but display separately the quantities.

code    description   price   order1 quantity  order2 quantity
000001  product A       1       10              100
000002  product B       2       20              20
000003  product C       3       30               0 
000004  product D       4        0              40

Apart from using tedious loops, I wonder if there is a cleaner way to achieve this in pandas. I used to process this in excel vba previously using ugly loops and is my first time trying it in pandas.

Thanks alot for any help!

You need to concat and drop duplicates. Check [this](https://stackoverflow.com/a/41181829/8353711) — shaik moeed, Jul 16 '19 at 06:58
Possible duplicate of [Pandas/Python: How to concatenate two dataframes without duplicates?](https://stackoverflow.com/questions/21317384/pandas-python-how-to-concatenate-two-dataframes-without-duplicates) — shaik moeed, Jul 16 '19 at 06:59

jezrael · Accepted Answer · 2019-07-17T05:32:10.400

You can create MultiIndex Series by DataFrame.set_index or aggregate sum in list comprehension and then join together by concat:

dfs = [df1, df2, ..., dfn]

#if triples code, description, price are unique
L = [x.set_index(['code', 'description', 'price'])['quantity'] for x in dfs]

#if triples are not unique and necessary aggregate sum
L = [x.groupby(['code', 'description', 'price'])['quantity'].sum() for x in dfs]

df1 = (pd.concat(L, axis=1, keys=range(1, len(dfs) + 1))
         .add_prefix('quantity')
         .fillna(0)
         .astype(int)
         .reset_index())
print (df1)
     code description  price  quantity1  quantity2
0  000001   product A      1         10        100
1  000002   product B      2         20         20
2  000003   product C      3         30          0
3  000004   product D      4          0         40

EDIT:

For set default column names for each DataFrame in list comprehension is possible by DataFrame.set_axis:

L = [x.set_axis(np.arange(len(x.columns)), axis=1, inplace=False).set_index([0,1,2])[3] 
                                             for x in dfs]

Thanks for input, before i can try out these solutions, I need rework my dataframe. Do you know how to remove the annoying column indexing? I mean each row already has a index, so each row is labeled 0 1 2 etc. But I have the same 0,1,2 etc for the columns? I believe this is preventing me from calling .columns and giving my dataframe column names. Would you have any idea? — J T, Jul 16 '19 at 21:23
I cannot express my gratitude, you answered me within like 10 minutes of posting my question and this works as per what I wanted. For me to do this in excel vba would have taken me a while and did not want to go down this route. Thanks alot jezrael senpai. — J T, Jul 18 '19 at 18:23

Python pandas Grouping in a different way

1 Answers1