Pandas: Create a new Data Frame using multiple GroupBy results

Question

My data is a Data Frame with retail items and their sales performance. Columns include: 2016 unit sales, 2015 unit sales, item description, etc. When I try to do a groupby for brand:

Data.groupby(by="Major Brand").sum()

I get the following error: TypeError: unorderable types: int() < str()

I assume this is because not all of the data in the DataFrame are numbers therefore pandas doesn't know how to 'sum'.

But I can get individual groupby's using something like:

Data.groupby(by="Major Brand")["2016 Units"].sum()

Ultimately I just want to group by "Major Brand" and compare "2016 Units" to "2015 Units" and put all three them into a new DataFrame with the "Major Brand" as the index.

I have tried merging my multiple groupby's together but that never seems to work.

Thank you!

MaxU - stand with Ukraine · Accepted Answer · 2016-06-12T21:42:54.087

2

you can do it this way:

Data.groupby(by="Major Brand")["2016 Units","2015 Units"].sum()

Demo:

In [29]: Data.groupby(by="Major Brand")["2016 Units","2015 Units"].sum()
Out[29]:
             2016 Units  2015 Units
Major Brand
1                   218         238
2                   172         122
3                   192         273
4                   176         172

Data:

In [30]: Data
Out[30]:
    Major Brand  2016 Units  2015 Units    X
0             1          75          83  xxx
1             1          82          95  xxx
2             3          85          47  xxx
3             3           1          40  xxx
4             1          43          43  xxx
5             4          35          65  xxx
6             3          38          71  xxx
7             4          56          90  xxx
8             3           9          77  xxx
9             1          18          17  xxx
10            3          59          38  xxx
11            4          85          17  xxx
12            2          64          13  xxx
13            2          32          33  xxx
14            2          76          76  xxx

edited Jun 12 '16 at 21:42

answered Jun 12 '16 at 21:37

MaxU - stand with Ukraine

205,989
36
386
419

For some reason I get the "TypeError: unorderable types: str() < int()" error. I can do the groupby for both "2016 Units" and "2015 Units" separately, so I have no idea why I would get this message. – Stephen Jun 12 '16 at 21:42
@Stephen, Try always to provide a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) when asking questions. In case of _pandas_ questions please provide sample _input_ and _output_ data sets (5-7 rows in CSV/dict/JSON/Python code format _as text_, so one could use it when coding an answer for you). This will help to avoid _situations_ like: `your code isn't working for me` or `it doesn't work with my data`, etc. – MaxU - stand with Ukraine Jun 12 '16 at 21:44
@Stephen, could you also post an output of the following command: `print(Data.dtypes)` – MaxU - stand with Ukraine Jun 12 '16 at 21:46
Sorry, I wish I could provide the whole data set, but it is huge. There are about 50 columns and 900 rows. The list provided from print(Data.dtypes) shows "Major Brand", "Product Description", etc as "object" and "2016 Units" , "2015 Units" as "float 64". Basically the columns that contain text are "object" and the columns that contain numbers are "float64". At the end of the list it says dtype: object. – Stephen Jun 12 '16 at 21:57
@Stephen, what pandas version are you using? – MaxU - stand with Ukraine Jun 12 '16 at 22:01
Looks to be: '0.18.0' – Stephen Jun 12 '16 at 22:06
@Stephen, sorry, i can't reproduce your error. I tried to put some strings to one `float64` column, but pandas was smart enough and just didn't show me that column, when applying `sum()` – MaxU - stand with Ukraine Jun 12 '16 at 22:09
Thank you for the help, I'm sure there's some weird inconsistency with my data or something. I will keep messing with it. Just out of curiosity, would NaN values contribute to this? I'm sure there are some floating in the data. – Stephen Jun 12 '16 at 22:23

score 1 · Answer 2 · answered Jun 13 '16 at 08:07

I get the following error: TypeError: unorderable types: int() < str()

Could it be that your dtypes are not correct? Eg str. instead of int? You could try create your dataframe with something as follows:

In [18]: import numpy as np; import pandas as pd

In [19]: col1 = ['adidas','nike','yourturn','zara','nike','nike','bla','bla','zalando','amazon']

In [20]: data = {'Major Brand':col1, '2016 Units':range(len(col1)), '2015 Units':range(len(col1),len(col1)*2)}

In [21]: x = pd.DataFrame(data, dtype=np.int64  )

In [22]: 

In [22]: x.groupby(by="Major Brand").sum()
Out[22]: 
             2015 Units  2016 Units
Major Brand                        
adidas               10           0
amazon               19           9
bla                  33          13
nike                 40          10
yourturn             12           2
zalando              18           8
zara                 13           3

In [23]: x.groupby(by="Major Brand")["2016 Units","2015 Units"].sum()
Out[23]: 
             2016 Units  2015 Units
Major Brand                        
adidas                0          10
amazon                9          19
bla                  13          33
nike                 10          40
yourturn              2          12
zalando               8          18
zara                  3          13

In [24]: x.dtypes
Out[24]: 
2015 Units      int64
2016 Units      int64
Major Brand    object
dtype: object

In [25]: x.groupby(by="Major Brand").agg(['count','sum','mean','median'])
Out[25]: 
            2015 Units                       2016 Units                     
                 count sum       mean median      count sum      mean median
Major Brand                                                                 
adidas               1  10  10.000000   10.0          1   0  0.000000    0.0
amazon               1  19  19.000000   19.0          1   9  9.000000    9.0
bla                  2  33  16.500000   16.5          2  13  6.500000    6.5
nike                 3  40  13.333333   14.0          3  10  3.333333    4.0
yourturn             1  12  12.000000   12.0          1   2  2.000000    2.0
zalando              1  18  18.000000   18.0          1   8  8.000000    8.0
zara                 1  13  13.000000   13.0          1   3  3.000000    3.0

Sorry, I tried and got: "ValueError: invalid literal for int() with base 10: 'Data'" — Stephen, Jun 13 '16 at 11:54
@stephen, have you tried also with the example data i created? — PlagTag, Jun 13 '16 at 14:13
@Stephen, for me it strongly looks like you have an type conversion error. Eg. some field in your raw data that cant be converted to an int. — PlagTag, Jun 13 '16 at 14:21
The weird thing is that I can do the individual groupby's no problem: `Data.groupby(by="Major Brand")["2016 Units"].sum()`. Both 2016 and 2015 work fine, the issue is only if I try to do them together. I will try with your data when I get to my programming comp. — Stephen, Jun 13 '16 at 17:51
@stephen, what version do you have i cant reproduce your error so its very hard to debug ... try: "import sys; sys.version" and "import pandas as pd; pd.__version__" — PlagTag, Jun 14 '16 at 07:03

Pandas: Create a new Data Frame using multiple GroupBy results

2 Answers2

Linked