Creating a Dataframe of Proportions

Question

I have list of lists like such:

[[person_a,code_1],[person_a,code_2],[person_a,code_3],[person_b,code_1],[person_b,code_1],[person_b,code_1],[person_a,code_4],[person_b,code_2]...]

I would like the achieve the following:

          code_1  code_2 code_3 code_4
person a   .2500   .2500   .2500    .2500
person b   .6667   .3333   0.0    0.0

I've used prop.table in R to achieve this before but am wondering if there is a Python equivalent. I can convert my list of list to a dataframe, what I'm interested in is some function that can generate each persons code proportions.

I do mean a pandas dataframe. I could convert my list of list to a dataframe but what I really want to know is how to generate each individual persons code proportion — ChuckF, Dec 13 '18 at 19:36
This looks like python code. Did you really want the R tag? Are R solutions acceptable? — G5W, Dec 13 '18 at 19:37
ChuckF: What is converting it to a dataframe with Pandas doing that's not what you want? What code are you using? — martineau, Dec 13 '18 at 19:40
I left the R tag in as Ive done this using R's prop.table function and was hoping someone familiar with R would know the python equivalent — ChuckF, Dec 13 '18 at 19:41
@Martineau: what its not doing is generating the code portions. ie person B had 2 instances of code_1 out of the 3 codes he issued. and python 2 — ChuckF, Dec 13 '18 at 19:43

score 1 · Accepted Answer · answered Dec 13 '18 at 19:42

Using pandas

import pandas as pd

data = [
    ['person_a', 'code_1'],
    ['person_a', 'code_2'],
    ['person_a', 'code_3'],
    ['person_b', 'code_1'],
    ['person_b', 'code_1'],
    ['person_b', 'code_1'],
    ['person_a', 'code_4'],
    ['person_b', 'code_2']]

df = pd.DataFrame(data, columns=['person', 'code'])

df = df.assign(relative_frequency=1).groupby(['person', 'code']).count().unstack()
# >>> df
#          relative_frequency                     
# code                 code_1 code_2 code_3 code_4
# person                                          
# person_a                  1      1      1      1
# person_b                  3      1    NaN    NaN

>>> df.div(df.sum(1), axis=0)
         relative_frequency                     
code                 code_1 code_2 code_3 code_4
person                                          
person_a               0.25   0.25   0.25   0.25
person_b               0.75   0.25    NaN    NaN

Creating a Dataframe of Proportions

1 Answers1