1

I have list of lists like such:

[[person_a,code_1],[person_a,code_2],[person_a,code_3],[person_b,code_1],[person_b,code_1],[person_b,code_1],[person_a,code_4],[person_b,code_2]...]

I would like the achieve the following:

          code_1  code_2 code_3 code_4
person a   .2500   .2500   .2500    .2500
person b   .6667   .3333   0.0    0.0

I've used prop.table in R to achieve this before but am wondering if there is a Python equivalent. I can convert my list of list to a dataframe, what I'm interested in is some function that can generate each persons code proportions.

Alexander
  • 105,104
  • 32
  • 201
  • 196
ChuckF
  • 13
  • 4
  • Do you mean a pandas dataframe? – martineau Dec 13 '18 at 19:33
  • I do mean a pandas dataframe. I could convert my list of list to a dataframe but what I really want to know is how to generate each individual persons code proportion – ChuckF Dec 13 '18 at 19:36
  • This looks like python code. Did you really want the R tag? Are R solutions acceptable? – G5W Dec 13 '18 at 19:37
  • @G5W: There's no code in the question. – martineau Dec 13 '18 at 19:39
  • ChuckF: What is converting it to a dataframe with Pandas doing that's not what you want? What code are you using? – martineau Dec 13 '18 at 19:40
  • I left the R tag in as Ive done this using R's prop.table function and was hoping someone familiar with R would know the python equivalent – ChuckF Dec 13 '18 at 19:41
  • @Martineau: what its not doing is generating the code portions. ie person B had 2 instances of code_1 out of the 3 codes he issued. and python 2 – ChuckF Dec 13 '18 at 19:43

1 Answers1

1

Using pandas

import pandas as pd

data = [
    ['person_a', 'code_1'],
    ['person_a', 'code_2'],
    ['person_a', 'code_3'],
    ['person_b', 'code_1'],
    ['person_b', 'code_1'],
    ['person_b', 'code_1'],
    ['person_a', 'code_4'],
    ['person_b', 'code_2']]

df = pd.DataFrame(data, columns=['person', 'code'])

df = df.assign(relative_frequency=1).groupby(['person', 'code']).count().unstack()
# >>> df
#          relative_frequency                     
# code                 code_1 code_2 code_3 code_4
# person                                          
# person_a                  1      1      1      1
# person_b                  3      1    NaN    NaN

>>> df.div(df.sum(1), axis=0)
         relative_frequency                     
code                 code_1 code_2 code_3 code_4
person                                          
person_a               0.25   0.25   0.25   0.25
person_b               0.75   0.25    NaN    NaN
Alexander
  • 105,104
  • 32
  • 201
  • 196