0

I'm trying to run a crosstab on my dataframe called "d_recent" using the following line of code:

pd.crosstab(d_recent['BinnedAge'],' d_recent['APBI']')  

The output I am getting is this:

|Age Bin|Brachytherapy|EBRT|IORT|
|-------|-------------|----|----|
|51-60|1|1|0|
|71-80|86|62|11|
|61-70|2578|723|276|
|41-50|9386|2049|1188|
|81-90|13860|3257|2449|
|31-40|7725|2078|1628|
|21-30|1958|615|425|

This is wrong. What it should look like is:

|Age Bin|Brachytherapy|EBRT|IORT|
|-------|-------------|----|----|
|21-30|1|1|0|
|31-40|86|62|11|
|41-50|2578|723|276|
|51-60|9386|2049|1188|
|61-70|13860|3257|2449|
|71-80|7725|2078|1628|
|81-90|1958|615|425|

Any idea what is going on here and how I can fix it? I can tell that the order of the rows in the first table is related to the order the specific bins are encountered in my dataframe. I can get the correct output if I sort by age prior to running the crosstab, but this isn't a preferable solution because I need to do this with multiple variables. Thanks!

sammywemmy
  • 27,093
  • 4
  • 17
  • 31
cladbury
  • 1
  • 1
  • Does the bin-range need to be sorted first? – Jairus Jan 21 '21 at 00:48
  • It doesn't look like it takes a range, but rather associative values. [ref crosstab](https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html), I think you are looking for this https://pbpython.com/pandas-qcut-cut.html – Jairus Jan 21 '21 at 00:51
  • and this https://stackoverflow.com/questions/45273731/binning-column-with-python-pandas – Jairus Jan 21 '21 at 00:52
  • @JairusMartin Thank you. I have actually already used cut to bin the data into 'BinnedAge', which is a string variable. I have verified that there a total of 2 entries with the value '21-30' under the 'BinnedAge' variable. I'm confused why i would have to sort the whole dataframe to get the correct counts to line up... – cladbury Jan 21 '21 at 01:06
  • Can you update your question with an [MRE](https://stackoverflow.com/help/minimal-reproducible-example), so the data and code so someone can reproduce your result. – Jairus Jan 21 '21 at 02:01
  • try solving the problem using qcut – Golden Lion Jan 22 '21 at 21:05

0 Answers0