1

Hello I have the following set of data in csv:

Group           Size     Some_other_column1      Some_other_column2

Short          Small            blabla1                     blabla6    
Moderate       Medium           babla3                      blabla8
Short          Small            blabla2                     blabla7
Moderate       Small            blabla4                     blabla9
Tall           Large            blabla5                     blabla10
Short          Medium           blabla11                    blabla12

I would like to get the following result using python code:

Group           Size      Count     Some_other_column1      Some_other_column2

Short          Small       2            blabla1                     blabla6
Moderate       Medium      1            babla3                      blabla8
Short          Small       2            blabla2                     blabla7
Moderate       Small       1            blabla4                     blabla9
Tall           Large       1            blabla5                     blabla10
Short          Medium      1            blabla11                    blabla12

Basically I need to count the number of group-size pairs and create a new column for that called, let's say, "Count", keeping all the other columns the same. I can use pandas or anything that can help.

For reference, there was another question asked on this topic, but it does not solve my problem since I have multiple columns that I need to keep: Python: get a frequency count based on two columns (variables) in pandas dataframe

There is another topic here: How to assign a name to the a size() column? But this is also not answering my question because I have 2 more columns ("some other column1/2") that I do not want to indirectly drop by applying the method described at the above link. Also, what is equally important, I do not want to merge pairs, I need to keep all of them, because they have different values on Some_other_column1/2.

dspencer
  • 4,297
  • 4
  • 22
  • 43
Catalin Bcn
  • 13
  • 1
  • 6

1 Answers1

0

You need insert with GroupBy.transform of size:

df.insert(2, 'Count', df.groupby(['Group','Size'])['Size'].transform('size'))
print (df)
      Group    Size  Count Some_other_column1 Some_other_column2
0     Short   Small      2            blabla1            blabla6
1  Moderate  Medium      1             babla3            blabla8
2     Short   Small      2            blabla2            blabla7
3  Moderate   Small      1            blabla4            blabla9
4      Tall   Large      1            blabla5           blabla10
5     Short  Medium      1           blabla11           blabla12
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252