1

I have two columns in a pandas dataframe.

Column 1 is ed and contains strings (e.g. 'a','a','b,'c','c','a')

ed column = ['a','a','b','c','c','a'] 

Column 2 is job and also contains strings (e.g. 'aa','bb','aa','aa','bb','cc')

job column = ['aa','bb','aa','aa','bb','cc'] #these are example values from column 2 of my pandas data frame

I then generate a two column frequency table like this:

my_counts= pdata.groupby(['ed','job']).size().unstack().fillna(0)

Now how do I then divide the frequencies in one column by the frequencies in another column of that frequency table? I want to take that ratio and use it to argsort() so that I can sort by the calculated ratio but I don't know how to reference each column of the resulting table.

NorthCat
  • 9,643
  • 16
  • 47
  • 50
Chris
  • 12,900
  • 12
  • 43
  • 65
  • It's very hard to tell what's going on here without some data: please try to [include a small, copy-pasteable example of what your data looks like](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Marius Aug 30 '14 at 01:47

1 Answers1

0

I initialized the data as follows:

ed_col = ['a','a','b','c','c','a']
job_col = ['aa','bb','aa','aa','bb','cc']
pdata = pd.DataFrame({'ed':ed_col, 'job':job_col})
my_counts= pdata.groupby(['ed','job']).size().unstack().fillna(0)

Now my_counts looks like this:

job  aa  bb  cc
ed             
a     1   1   1
b     1   0   0
c     1   1   0

To access a column, you could use my_counts.aa or my_counts['aa']. To access a row, you could use my_counts.loc['a'].

So the frequencies of aa divided by bb are my_counts['aa'] / my_counts['bb']

and now, if you want to get it sorted, you can do:

my_counts.iloc[(my_counts['aa'] / my_counts['bb']).argsort()]
Korem
  • 11,383
  • 7
  • 55
  • 72
  • Thank you, this is an above and beyond answer! For anybody else who is having a similar issue, the underlying cause was that the text values in the two columns had leading whitespace from the import that needed to be removed via the .strip() function. – Chris Aug 31 '14 at 23:40