Get count of occurrences inside two columns inside a csv

Question

Hello I have the following set of data in csv:

Group           Size     Some_other_column1      Some_other_column2

Short          Small            blabla1                     blabla6    
Moderate       Medium           babla3                      blabla8
Short          Small            blabla2                     blabla7
Moderate       Small            blabla4                     blabla9
Tall           Large            blabla5                     blabla10
Short          Medium           blabla11                    blabla12

I would like to get the following result using python code:

Group           Size      Count     Some_other_column1      Some_other_column2

Short          Small       2            blabla1                     blabla6
Moderate       Medium      1            babla3                      blabla8
Short          Small       2            blabla2                     blabla7
Moderate       Small       1            blabla4                     blabla9
Tall           Large       1            blabla5                     blabla10
Short          Medium      1            blabla11                    blabla12

Basically I need to count the number of group-size pairs and create a new column for that called, let's say, "Count", keeping all the other columns the same. I can use pandas or anything that can help.

For reference, there was another question asked on this topic, but it does not solve my problem since I have multiple columns that I need to keep: Python: get a frequency count based on two columns (variables) in pandas dataframe

There is another topic here: How to assign a name to the a size() column? But this is also not answering my question because I have 2 more columns ("some other column1/2") that I do not want to indirectly drop by applying the method described at the above link. Also, what is equally important, I do not want to merge pairs, I need to keep all of them, because they have different values on Some_other_column1/2.

Thanks but it does not answer my question, I have explained why in the main question. — Catalin Bcn, Jul 13 '18 at 08:00
hmmm, but if need column to 3rd position, solution should be a bit different, it is necessary? — jezrael, Jul 13 '18 at 08:17

score 0 · Accepted Answer · answered Jul 13 '18 at 08:23

You need insert with GroupBy.transform of size:

df.insert(2, 'Count', df.groupby(['Group','Size'])['Size'].transform('size'))
print (df)
      Group    Size  Count Some_other_column1 Some_other_column2
0     Short   Small      2            blabla1            blabla6
1  Moderate  Medium      1             babla3            blabla8
2     Short   Small      2            blabla2            blabla7
3  Moderate   Small      1            blabla4            blabla9
4      Tall   Large      1            blabla5           blabla10
5     Short  Medium      1           blabla11           blabla12

Not necessary to be on 3rd position but it's good to know that this option is available as well. — Catalin Bcn, Jul 13 '18 at 09:28

Get count of occurrences inside two columns inside a csv

1 Answers1

Linked