pandas DataFrame pivot table sum function not correct

Question

As the title mentions,

diag_code = df.pivot_table('PayabletoProvider',rows='DiagnosisCode',aggfunc=sum)

After applying the pivot function to my df, I am returned with data that dont make sense:

DiagnosisCode ------ PayabletoProvider

002 -------------------- NaN

003 -------------------- NaN

004 -------------------- 1824.70143.80220.001640.2540.00282.00946.31250...

005 -------------------- NaN

006 -------------------- 3324.95432.482400.65

Dont understand why this happens, as results in Excel seem to be ok and sum accordingly.

Would appreciate if somebody can help me fix this problem.

What are the data types? If you have mixed types that could cause the problem. For example, you can't sum a mix of strings and floats in pandas but Excel would silently drop the string value and sum the floats. — jmz, Mar 03 '14 at 09:11
@rauparaha I used Excel to check whether IsNumber and all True. Might it be different number types that will also cause this problem? Is there anyway I can replicate the function like Excel where it drops string types? I have started with a small set of data, but I am using Pandas because the data size will span to a couple million where Excel will just fail. — BernardL, Mar 03 '14 at 09:14
I don't know how you're importing but a mistake in eg. a CSV import where you don't specify all NA markers can cause incorrect detection of types. Use `df.info()` to check that your columns are the correct type and not `object`. Dropping strings without identifying the root of the problem should be a last resort but [look here](http://stackoverflow.com/questions/12725417/drop-non-numeric-columns-from-a-pandas-dataframe) for how. — jmz, Mar 03 '14 at 09:21
@rauparaha I guess then to avoid, the first step in importing is to ensure all data types are correctly declared and all NA markers are specified? Thanks for your help. — BernardL, Mar 03 '14 at 09:27

score 0 · Answer 1 · answered Feb 22 '17 at 13:33

0

Old question but the answer might be that aggfunc expects "np.sum" instead of sum.

answered Feb 22 '17 at 13:33

Pelican

1 Answers1