1

I have a simple dataframe.


        SITE    OPERTN

152     R1H12    V011
200     RR801    V014
342     RM301    V011
375     RTH08    V011
469     RYJ02    V011
... ... ...
634503  RRK15   V011
634763  RRVNQ   V014
635237  R1H12   V011
635512  RTH08   V011
635548  RJE01   V011
4689 rows × 2 columns

I want to convert this to a simple spreadsheet type table of:

site    V011   V012   V014

R1H12   count  count  count
RR801   count  count  count
RM301   count  count  count
RTH08   count  count  count
etc

I have tried pivoting it, and pivot_table but can't get the agfunc=sum to do it.

capnahab
  • 343
  • 3
  • 14
  • Does this answer your question? [Python Pandas : pivot table with aggfunc = count unique distinct](https://stackoverflow.com/questions/12860421/python-pandas-pivot-table-with-aggfunc-count-unique-distinct) – villoro Mar 07 '20 at 22:33

3 Answers3

2

You can use pd.crosstab():

pd.crosstab(df['SITE'], df['OPERTN'])

Or df.group_by(), e.g:

df.groupby(['SITE', 'OPERTN']).size().unstack(fill_value=0)
AChampion
  • 29,683
  • 4
  • 59
  • 75
1

It should be possible to do it with:

df. pivot_table(values='OPERTN' , rows='SITE', cols='OPERTN', 
                     aggfunc=lambda x: len(x.unique())
Renaud
  • 2,709
  • 2
  • 9
  • 24
  • It would have to be ``` pd.pivot_table() ``` If you're going to call it on a dataframe, it would need to be "pivot" instead "pivot_table" and wouldn't be able to use an aggregating function. – LTheriault Mar 07 '20 at 22:45
1

This works to accomplish what you're asking for and will also replace nulls with 0s where a value isn't found for a 'SITE' when counting.

One thing to remember is that there is a similar function called "pivot". The one below is "pivot_table" a method of the pandas library while "pivot" is a function to call on a DataFrame. You can get similar results with either, but "pivot" can't aggregate data and the arguments you need to pass to them are slightly different.

df = pd.pivot_table(df,values='OPERTN' , index='SITE', columns='OPERTN', 
                     aggfunc={'OPERTN':'count'}).fillna(0)
LTheriault
  • 1,180
  • 6
  • 15