1

I have a pandas dataframe with two columns. I want to measure the transition count, that is, the number of times that each unique first column value is related to each unique second column value. This should be a pivot or pivot_table but I am stuck. In the code pasted, trial is the input dataframe, and ans is the answer dataframe what I would like to see by manipulating the trial dataframe.

I did not spot a similar dataframe question which has only two columns. The others used pivot on a third table where a mean or sum aggfunc were used. This is a case where there are only two columns, and I want to count the transitions. The other questions also used numerical columns where aggregation is possible. I want to count the columns for a non-numeric value.

If there is a similar question, would be very helpful if someone can point me to it.

trial=pd.DataFrame({'col1':list('AABCCCDDDD'),'col2':list('XYXXXYYXZZ')})
index  col1  col2

 0      A       X

 1      A       Y

 2      B       X

 3      C       X

 4      C       X

 5      C       Y

 6      D       Y

 7      D       X

 8      D       Z

 9      D       Z
ans=pd.DataFrame({'col1':list('ABCD'),'X':[1,1,2,1],'Y':[1,0,1,1],'Z':[0,0,0,2]})
ans.set_index('col1')
col1      X  Y  Z

  A       1  1  0

  B       1  0  0

  C       2  1  0

  D       1  1  2
vesuvius
  • 435
  • 4
  • 20
noobcoder
  • 11
  • 2
  • This did not seem to have been answered earlier although it was marked duplicate, as other examples used a third column for pivoting under 'values'. However adding a dummy third column worked, which I did using reset_index(). trial.reset_index().pivot_table(index='col1',columns='col2',values='index',aggfunc='count') gives the correct answer – noobcoder Jul 15 '19 at 11:24
  • pd.crosstab(trial.col1, trial.col2,margins=True) works well too – noobcoder Jul 16 '19 at 21:05

0 Answers0