3

I am trying to pivot following type of sample data in Pandas dataframe in Python. I came across couple of other stackoverflow answers that discussed how to do the pivot: pivot_table No numeric types to aggregate

However, when I use pivot_table(), I am able to pivot the data. But when I use set_index() and unstack(), I get following error:

AttributeError: 'NoneType' object has no attribute 'unstack'

Sample Data:

id  responseTime    label   answers
ABC 2018-06-24  Category_1  [3]
ABC 2018-06-24  Category_2  [10]
ABC 2018-06-24  Category_3  [10]
DEF 2018-06-25  Category_1  [7]
DEF 2018-06-25  Category_8  [10]
GHI 2018-06-28  Category_3  [7]

Desired Output:

id  responseTime    category_1  category_2 category_3 category_8
ABC  2018-06-24           [3]     [10]         [10]       NULL
DEF  2018-06-25           [7]     NULL         NULL       [10]
GHI  2018-06-28           NULL    NULL         [7]        NULL

This works:

 df=pdDF.pivot_table(index=['items_id','responseTime'], columns='label', values='answers', aggfunc='first') 

This doesn't work:

pdDF.set_index(['items_id','responseTime','label'], append=True, inplace=True).unstack('label')

I also used pdDF[pdDF.isnull().any(axis=1)] to make sure I don't have any NULL data in answers column. I also used append=False but same error happened.

From other threads, it seems set_index() and unstack() are more efficient than pivot_table(). I also don't want to use pivot_table() because it requires aggregation function and my answers column doesn't contain numeric data. I didn't want to use default (mean()) so I ended up using first(). Any insights on why one method works and another doesn't?

rp1
  • 159
  • 2
  • 10

1 Answers1

1

AttributeError: 'NoneType' object has no attribute 'unstack'

When you use inplace = True in set_index it modified the dataframe in place. It doesn't return anything(None). So you can't use unstack on None object.

inplace : boolean, default False

Modify the DataFrame in place (do not create a new object)

Use:

df1 = pdDF.set_index(['items_id','responseTime','label']).unstack('label')    
print(df1)

# Output:

id  responseTime    category_1  category_2 category_3 category_8
ABC  2018-06-24           [3]     [10]         [10]       NULL
DEF  2018-06-25           [7]     NULL         NULL       [10]
GHI  2018-06-28           NULL    NULL         [7]        NULL
Community
  • 1
  • 1
Abhi
  • 4,068
  • 1
  • 16
  • 29
  • 1
    Argh! Dang it, i can't believe I missed such a simple thing :) Thank you. That fixed it and I am marking your answers as Accepted. Additionally, there was one more error in my code: `append=True` should be `append=False` because if I didn't set it to `False`, original index would be preserved along with additional columns being appended to it, and I would never go from 6 rows pivoted to 3 rows. – rp1 Sep 07 '18 at 23:25
  • @rp1 Yes! That's why I didn't include `append=True` in the answer. I just forgot to mention it. Thanks:) – Abhi Sep 07 '18 at 23:27