python Pandas groupby method

Question

I am having an issue utilizing the groupby functions with pandas module. I am getting the error: DataError: No numeric types to aggregate

I am not sure what i am doing wrong there is numerical data in the dataframe.

Below is my code:

lte_columns = ['Period start','Period end','zone','usid','site id','rank','Total LCQI Impact','LTE BLOCK Impact','LTE DROP Impact','LTE TPUT Impact','engineer notes']
#lte_df = pd.DataFrame(dtype=float)
lte_df = pd.DataFrame(dtype=float)

## iterate over the CQI impact file seperate LTE from UMTS and perform lookup for each Technology/USID
testFile = "sample_CSCT_CQI_IMPACT_Greater Midwest_20160305_20160311.xls"
df = pd.read_excel(testFile,sheetname="Sheet1")

weekBegin = df['Date'].min()
weekEnd = df['Date'].max()

## update new dataFrames while iterating over input dataframe

for idx, row in df.iterrows():
    usid = row['USID']
    region, zone = row['District & Zone'].split('-')
    if usid in lte_lookup:
        site_id = lte_lookup[usid][1]
    else:
        site_id = "N/A"

    lte = pd.Series([weekBegin,weekEnd,zone,usid,site_id,'0','0','0','0','0','0'])
    lte_df = lte_df.append(lte,ignore_index=True)

lte_df.columns = lte_columns
grps = lte_df.groupby(['usid'])
avgs = grps.mean()
avgs.to_excel("pandas_out.xlsx",merge_cells=False) 

print "done"

Here is a sample of what lte_df looks like:

>>> print lte_df
     Period start  Period end zone      usid    site id rank Total LCQI Impact LTE BLOCK Impact LTE DROP Impact LTE TPUT Impact engineer notes
0      03/05/2016  03/11/2016  69E   56788.0   MOL02607    0                 0                0               0               0              0
1      03/05/2016  03/11/2016  70F   58438.0   KSL05065    0                 0                0               0               0              0
2      03/05/2016  03/11/2016  69A  120595.0  MOL00531W    0                 0                0               0               0              0
3      03/05/2016  03/11/2016  70D   75566.0   KSL04272    0                 0                0               0               0              0
4      03/05/2016  03/11/2016  70F   58454.0   KSL05106    0                 0                0               0               0              0
5      03/05/2016  03/11/2016  70E   41793.0   KSL04151    0                 0                0               0               0              0
6      03/05/2016  03/11/2016  70C    9500.0   KSL06382    0                 0                0               0               0              0
7      03/05/2016  03/11/2016  69A   56586.0   MOL01143    0                 0                0               0               0              0


>>> lte_df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6565 entries, 0 to 6564
Data columns (total 11 columns):
Period start         6565 non-null object
Period end           6565 non-null object
zone                 6565 non-null object
usid                 6565 non-null float64
site id              6565 non-null object
rank                 6565 non-null object
Total LCQI Impact    6565 non-null object
LTE BLOCK Impact     6565 non-null object
LTE DROP Impact      6565 non-null object
LTE TPUT Impact      6565 non-null object
engineer notes       6565 non-null object
dtypes: float64(1), object(10)
memory usage: 615.5+ KB
>>>

add the output of `lte_df.info()` -- the error suggests that all of your columns are being read as objects instead of ints or floats — Paul H, Mar 23 '16 at 22:31
see also this: http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Paul H, Mar 23 '16 at 22:32
the only column that is numeric is the column on which you're creating your groups. having numeric columns will make this issue go away. — Paul H, Mar 23 '16 at 23:11

score 0 · Answer 1 · answered Sep 10 '16 at 16:23

Based on the data you have available in your DataFrame your groupby is not working because your code is attempting to determine a mean for the columns and it can't because they are not floats. Even your zeros in the other columns are strings.

So this won't work:

grps = lte_df.groupby(['usid'])
avgs = grps.mean()

But for example

grps = lte_df[['Period start', 'usid']].groupby(['Period start'])
avgs = grps.mean()

will work as it is grouping by some column and the only other remaining column is a float and there fore will return something. I realize this is not what you were trying to do but it is an example of how it might work.

python Pandas groupby method

1 Answers1