-1

I have a assignment dataset which consists of students' grades sorted by academic groups for various modules. In my assignment, I used the following code to acquire an output which I later hardcoded it into a table:

math_comparison_size = data.groupby(["Academic Group","Math.SemGrade"]).size()

enter image description here

Is there a way where I can combine/merge certain rows together to increase their values?

Such as:

A (combination of A and A+ values for each academic group E.g. for Group A it will be 28 instead of 7 and 21)

so on and forth...

Corralien
  • 109,409
  • 8
  • 28
  • 52
excpro
  • 9
  • 3

1 Answers1

1

Update

>>> df.unstack(level=0).groupby(df.index.levels[1].map(mapping)).sum()

Academic Group  Grp A  Grp B  Grp C  Grp D  Grp E
Math.SemGrade                                    
A                  43     49     93     82     39
B                  24     69     65     59     57
C                  20      8      5     23     13

Input data:

Academic Group  Math.SemGrade
Grp A           A                13
                A+               17
                AD                6
                B                 1
                B+                6
                C                 9
                C+                0
                D                 9
                D+                6
                F                20
Grp B           A                 6
                A+                8
                AD               19
                B                 6
                B+               10
                C                24
                C+                5
                D                11
                D+               29
                F                 8
Grp C           A                22
                A+               27
                AD               16
                B                14
                B+               14
                C                26
                C+               14
                D                 0
                D+               25
                F                 5
Grp D           A                29
                A+               23
                AD                2
                B                11
                B+               17
                C                 1
                C+               27
                D                 3
                D+               28
                F                23
Grp E           A                 2
                A+                9
                AD                4
                B                 9
                B+               15
                C                18
                C+               10
                D                 5
                D+               24
                F                13
dtype: int64

Old answer

If you want to group by the first letter (A -> A, A+ -> A, ...), you can use:

>>> (df.groupby(df['Maths Semester Grades'].str[0])
       .sum(numeric_only=True).reset_index())

  Maths Semester Grades  Grp A  Grp B  Grp C  Grp D  Grp E
0                     A     36     33     65     54     15
1                     B      7     16     28     28     24
2                     C      9     29     40     28     28
3                     D     15     40     25     31     29
4                     F     20      8      5     23     13

If you want to control the groups, use a mapping dict:

mapping = {'A': 'A', 'A+': 'A', 'AD': 'A', 'B': 'A', 'B+': 'A',
           'C': 'B', 'C+': 'B', 'D': 'B', 'D+': 'B',
           'E': 'C', 'F': 'C'}

>>> (df.groupby(df['Maths Semester Grades'].map(mapping))
       .sum(numeric_only=True).reset_index()

  Maths Semester Grades  Grp A  Grp B  Grp C  Grp D  Grp E
0                     A     43     49     93     82     39
1                     B     24     69     65     59     57
2                     C     20      8      5     23     13

Input dataframe:

>>> df
  Maths Semester Grades  Grp A  Grp B  Grp C  Grp D  Grp E
0                     A     13      6     22     29      2
1                    A+     17      8     27     23      9
2                    AD      6     19     16      2      4
3                     B      1      6     14     11      9
4                    B+      6     10     14     17     15
5                     C      9     24     26      1     18
6                    C+      0      5     14     27     10
7                     D      9     11      0      3      5
8                    D+      6     29     25     28     24
9                     F     20      8      5     23     13
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Hello, I note that you used "df['Maths Semester Grades']". However, my dataframe does not have such a column and instead they have "["Academic Group","Math.SemGrade"]" I have tried changing your input columns to mine but it does not work – excpro Jan 06 '23 at 07:24
  • 1
    Your image shows a 'Maths Semester Grades' column. Please fix your input data with a reproducible example (not an image). – Corralien Jan 06 '23 at 07:27
  • I see. I am sorry for the misunderstanding. I have changed the image to my actual output data. – excpro Jan 06 '23 at 07:29
  • 1
    @excpro Your input is still not [reproducible](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – mozway Jan 06 '23 at 07:34
  • Yes, you have to post raw data and not image. I updated my answer according your comment. Does it solve your problem now? – Corralien Jan 06 '23 at 07:37
  • 1
    @Corralien yes I have seen it (I also had sent you a message on one of your posts, but those are deleted so quickly by flags…). Happy new year to you too and all the best :) – mozway Jan 06 '23 at 07:40