0

Below is df, I need to analyze the data.

   gender   dob          list
0   M   01/01/87    [['Office/Work'],['31-35'], ['Salaried']]
1   M   01/01/94    [[Movies,Restaurants'],['21-25'], ['Salaried']]
2   M   01/01/95    [['College/Park'],['21-25'],['Student']] 
3   F   01/01/97    [['College'], ['21-25'], ['Student']]

Expected Out 1. I need to analyse how many salaried are there in the dataset

df['salaried']

Total = 2, Male = 2, Female = 0

  1. How many student are there in the list df['students']

Total = 2, Male = 1, Female = 1

  1. How many are going to Movies df['Movies']

Total = 1, Male = 1, Female=0

  1. Group by different age group df['age_group']

    Age_Group Total Male Female ['21-25'] 3 2 1 ['31-35'] 1 1 0

  2. What is the percentage of Male to Female

round(len(df.loc[df['gender'] == 'M']) / (len(df.loc[df['gender'] == 'M']) + len(df.loc[df['gender'] == 'F'])),2)*100

  • Is this a homework exercise? – ignoring_gravity Jan 01 '20 at 18:33
  • @ignoring_gravity, its actually a problem. I have done with very long solution which is not that up to mark. Got stuck so put a question. You can take up the issue –  Jan 01 '20 at 18:43
  • how do you want your final df to look like. i know you have posted the expected results, but if you can post the dataframe expected, that'd help – anky Jan 01 '20 at 18:56

1 Answers1

1

You can use explode to split column's lists of element into rows.

df=pd.DataFrame({'gender':['M','M','M','F'],'B':[[['Office/Work'],['31-35'], ['Salaried']],[['Movies,Restaurants'],['21-25'], ['Salaried']],[[
'College/Park'],['21-25'],['Student']],[['College'], ['21-25'], ['Student']]]}) 

df:

  gender                                            B
0      M         [[Office/Work], [31-35], [Salaried]]
1      M  [[Movies,Restaurants], [21-25], [Salaried]]
2      M         [[College/Park], [21-25], [Student]]
3      F              [[College], [21-25], [Student]]

x=df.explode('B')

x:

  gender                     B
0      M         [Office/Work]
0      M               [31-35]
0      M            [Salaried]
1      M  [Movies,Restaurants]
1      M               [21-25]
1      M            [Salaried]
2      M        [College/Park]
2      M               [21-25]
2      M             [Student]
3      F             [College]
3      F               [21-25]
3      F             [Student]

x['B']=x.B.astype(str) 
final_df=x.groupby(['B','gender']).size().unstack(fill_value=0)  

final_df:

gender                  F  M
B                           
['21-25']               1  2
['31-35']               0  1
['College']             1  0
['College/Park']        0  1
['Movies,Restaurants']  0  1
['Office/Work']         0  1
['Salaried']            0  2
['Student']             1  1

You can compute total using F, M columns.

Pygirl
  • 12,969
  • 5
  • 30
  • 43
  • `df_final_resp.assign(final_resp=df_final_resp.final_resp.str.split('],').replace('[','')).explode('final_resp').head()` your code didn't work for me, I have done above one, can you tell me what could be the reason –  Jan 02 '20 at 04:52
  • 1
    You can use `df['col'].apply(', '.join)` in place of `df_final_resp.final_resp.str.split('],').replace('[',''))` source:https://stackoverflow.com/questions/37347725/converting-a-panda-df-list-into-a-string/37347837 – Pygirl Jan 02 '20 at 06:34
  • 1
    The reason it's not working because you are making your list as a string. explode works for list. So you should first explode as I did. If you don't want bracket do this : `x['B'] =x['B'].apply(''.join)` – Pygirl Jan 02 '20 at 06:36
  • Just replace `x['B']=x.B.astype(str)` with `x['B'] =x['B'].apply(''.join)` and you are ready to go. – Pygirl Jan 02 '20 at 06:42