Summarizing dataframe string values to count in Python 3

Question

in the screenshot below you'll find a dataframe that contains string values in each cell. What i would like to do is to create a new dataframe out of this one that contains 3 columns: 'Very interested' 'Somewhat interested', and 'Not interested'. I don't know how to transform the original df into this new one, i tried just counting the values that meets a condition like 'Very interested' and putting them into a new df but the numbers don't seem right.

i would appreciate any help here. Thank you.

EDIT: here is also the code to reproduce a dataframe similar to the one in the screenshot:

df = pd.DataFrame({1: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 2: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 3: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 4: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 5: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested'], 6: ['Very interested', 'Not interested', 'Somewhat interested', 'Very interested', 'Not interested', 'Somewhat interested']}, 
                 index=['Big Data','Data Analysis','Data Journalism', 'Data Visualization', 'Deep Learning', 'Machine Learning'])

As per the desired output, it should be something like this:

Could you include your expected output dataframe in your post? — rahlf23, Sep 12 '18 at 14:26
Please read up on [how to ask a good pandas question](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). Your code contains no usable input (because you pasted an image), no expected output, and shows no research effort. — DSM, Sep 12 '18 at 14:36
@rahlf23 Sorry, i just edited the question and added what you were asking for — Miguel 2488, Sep 13 '18 at 08:04

jezrael · Accepted Answer · 2018-09-12T14:34:15.783

1

I think need reshape by melt and then get counts by GroupBy.size with Series.unstack:

df = (df.rename_axis('val')
        .reset_index()
        .melt('val', var_name='a', value_name='b')
        .groupby(['val','b'])
        .size()
        .unstack(fill_value=0))

Another solution withstack, counts by SeriesGroupBy.value_counts with Series.unstack:

df = (df.stack()
        .groupby(level=0)
        .value_counts()
        .unstack(fill_value=0))

edited Sep 12 '18 at 14:34

answered Sep 12 '18 at 14:29

jezrael

822,522
95
1,334
1,252

Hi @jezrael Thank you very much!! that was exactly what i'm looking for. I'm glad to see someone understood what i was asking :) – Miguel 2488 Sep 13 '18 at 07:21
1

@Miguel2488 - There is problem in your question not possible copy data, you can improve your question by add `df = pd.DataFrame({1: ['vi', 'ni', 'ni'], 2: ['vi', 'ni', 'vi'], 3: ['vi', 'si', 'si'], 4: ['si', 'vi', 'vi']}, index=['dv','ml','das'])` - be free modify it ;) – jezrael Sep 13 '18 at 07:24
Allright, i'm editing it – Miguel 2488 Sep 13 '18 at 07:57

Summarizing dataframe string values to count in Python 3

1 Answers1