Pandas DF Groupby

Question

I have a dataframe of student responses[S1-S82] and each strand corresponding to the response. I want to know the count of each response given wrt each strand. If the student marked answer correctly I want to know the strand name and no. of correct responses, if the answer is wrong I want to know the strand name and no. of wrong responses(similar to value counts). I am attaching a screenshot of the dataframe. https://prnt.sc/1125odu

I have written the following code data_transposed['Counts'] = data_transposed.groupby(['STRAND-->'])['S1'].transform('count') but it is really not helping me get what I want. I am looking for an option similar to value_counts to plot the data. Please look into it and help me. Thank you,

provide your data in a usable text format. `print(df.to_markdown())`` or `print(df.to_string())` — Rob Raymond, Apr 02 '21 at 08:02
Can you please share the data as a text. Here's a Stack Overflow wiki to help you share your [pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Joe Ferndz, Apr 02 '21 at 08:17
screenshot is not useful - can't use OCR to populate a DF to show you a possible solution — Rob Raymond, Apr 02 '21 at 09:12

Tustoyevsky · Answer 1 · 2021-04-04T14:08:58.130

0

For single student you can do:

df.groupby(['Strand-->', 'S1']).size().to_frame(name = 'size').reset_index()

If you want to calculate all students at once you can do:

df_m = pd.melt(df, id_vars=['Strand-->'], value_vars=df.columns[1:]).rename({'variable':'result'},axis=1).sort_values(['result'])
df_m['result'].groupby([df_m['Strand-->'],df_m['value']]).value_counts().unstack(fill_value=0).reset_index()

edited Apr 04 '21 at 14:08

answered Apr 02 '21 at 08:06

Tustoyevsky

66
4

Hi, yes I am looking for a way to count the 0's and 1's but I also want wrt the strand how many are right or wrong i.e in strand geometry how many are 1's and 0's of S1, in mensuration how many are 1's and 0's of S1 like that for all the strands, and I can apply it to all students. – Lavanya Anupindi Apr 02 '21 at 08:15
Hi thats exactly what I am looking for. The output is as follows https://prnt.sc/1127nvd. Can u also guide me how to plot this data, in a pie chart or a donut as I could easily plot this in power bi but I want to do the plotting in pandas using available visualization libraries. – Lavanya Anupindi Apr 02 '21 at 08:57
There are many visualization tutorials online, you can refer to this as a staring point: https://www.dataforeverybody.com/matplotlib-seaborn-pie-charts/ – Tustoyevsky Apr 02 '21 at 09:04
Sure, will refer to the tutorial. Thanks a lot for ur timely help. – Lavanya Anupindi Apr 02 '21 at 09:10
Please avoid doing loops and merging. You are merging the dataframe 82 times to get the result. – Joe Ferndz Apr 02 '21 at 09:49
Yes you are right, this is an ugly solution, updated the answer again with value_counts method. – Tustoyevsky Apr 04 '21 at 14:04

Joe Ferndz · Accepted Answer · 2021-04-02T09:43:48.130

I think you are looking to groupby the Strands for each student S1 thru S82.

Here's how I would do it.

Step 1: Create a DataFrame with groupby Strand--> where value is 0
Step 2: Create another DataFrame with groupby Strand--> where value is 1
Step 3: Add a column in each of the dataframes and assign value of 0 or 1 to represent which data it grouped
Step 4: Concatenate both dataframes.
Step 5: Rearrange the columns to have Strand-->, val, then all students S1 thru S82
Step 6: Sort the dataframe using Strand--> so you get the values in the right order.

The code is as shown below:

import pandas as pd
import numpy as np

d = {'Strand-->':['Geometry','Geometry','Geometry','Geometry','Mensuration',
                                'Mensuration','Mensuration','Geometry','Algebra','Algebra',
                                'Comparing Quantities','Geometry','Data Handling','Geometry','Geometry']}
for i in range(1,83): d ['S'+str(i)] = np.random.randint(0,2,size=15) 
df = pd.DataFrame(d)
print (df)

df1 = df.groupby('Strand-->').agg(lambda x: x.eq(0).sum())
df1['val'] = 0
df2 = df.groupby('Strand-->').agg(lambda x: x.ne(0).sum())
df2['val'] = 1
df3 = pd.concat([df1,df2]).reset_index()
dx = [0,-1] + [i for i in range(1,83)]
df3 = df3[df3.columns[dx]].sort_values('Strand-->').reset_index(drop=True)
print (df3)

The output of this will be as follows:

Original DataFrame:

               Strand-->  S1  S2  S3  S4  S5  ...  S77  S78  S79  S80  S81  S82
0               Geometry   0   1   0   0   1  ...    1    0    0    0    1    0
1               Geometry   0   0   0   1   1  ...    1    1    1    0    0    0
2               Geometry   1   1   1   0   0  ...    0    0    1    0    0    0
3               Geometry   0   1   1   0   1  ...    1    0    0    1    0    1
4            Mensuration   1   1   1   0   1  ...    0    1    1    1    0    0
5            Mensuration   0   1   1   1   0  ...    1    0    0    1    1    0
6            Mensuration   1   0   1   1   1  ...    0    1    0    0    1    0
7               Geometry   1   0   1   1   1  ...    1    1    1    0    0    1
8                Algebra   0   0   1   0   1  ...    1    1    0    0    1    1
9                Algebra   0   1   0   1   1  ...    1    1    1    1    0    1
10  Comparing Quantities   1   1   0   1   1  ...    1    1    0    1    1    0
11              Geometry   1   1   1   1   0  ...    0    0    1    0    1    0
12         Data Handling   1   1   0   0   0  ...    1    0    1    1    0    0
13              Geometry   1   1   1   0   0  ...    1    1    1    1    0    0
14              Geometry   0   1   0   0   1  ...    0    1    1    0    1    0

Updated DataFrame: Note here that column 'val' will be 0 or 1. If 0, then it is the count of 0s. If 1, then it is the count of 1s.

              Strand-->  val  S1  S2  S3  S4  ...  S77  S78  S79  S80  S81  S82
0               Algebra    0   2   1   1   1  ...    0    0    1    1    1    0
1               Algebra    1   0   1   1   1  ...    2    2    1    1    1    2
2  Comparing Quantities    0   0   0   1   0  ...    0    0    1    0    0    1
3  Comparing Quantities    1   1   1   0   1  ...    1    1    0    1    1    0
4         Data Handling    0   0   0   1   1  ...    0    1    0    0    1    1
5         Data Handling    1   1   1   0   0  ...    1    0    1    1    0    0
6              Geometry    0   4   2   3   5  ...    3    4    2    6    5    6
7              Geometry    1   4   6   5   3  ...    5    4    6    2    3    2
8           Mensuration    0   1   1   0   1  ...    2    1    2    1    1    3
9           Mensuration    1   2   2   3   2  ...    1    2    1    2    2    0

This has helped in resolving the problem, thank you . – Lavanya Anupindi Apr 30 '21 at 11:39 — Lavanya Anupindi, Apr 30 '21 at 11:39

Pandas DF Groupby

2 Answers2