2

I want to calculate the percentage of my Products column according to the occurrences per related Country. I would greatly appreciate your help.

Here is what I did so far, I calculated my new dataframe with this code:

gb = data1.groupby(['Country', 'Products']).size()
df = gb.to_frame(name = 'ProductsCount').reset_index()
df

Which gives me something that look like this:

   Countries    Products     ProductsCount
0  Country 1     Product 1     5
1  Country 1     Product 2     31
2  Country 2     Product 1     2
3  Country 2     Product 2     1

Note: I have a couple of thousands rows of output.

My goal is to get the percentage per each products according to the country directly without calculating the ['ProductsCount'], like this:

   Countries    Products     Percentage
0  Country 1     Product 1     0.138
1  Country 1     Product 2     0.861
2  Country 2     Product 1     0.667
3  Country 2     Product 2     0.333

Otherwise If I can't get the the output to show only the %, then I would like something like this:

   Countries    Products     ProductsCount   Products%
0  Country 1     Product 1     5                0.138
1  Country 1     Product 2     31               0.861
2  Country 2     Product 1     2                0.667
3  Country 2     Product 2     1                0.333

I managed to calculate only the % according to the whole dataset using this code:

df['Products%'] = df.ProductsCount/len(df.Country)

Thank you in advance!

Killi Mandjaro
  • 145
  • 2
  • 15
  • Well, what is the rule that tells you, for example, that the result should be `0.138` given `5` products, or whatever? What is it a percentage *of*? When you say "I managed to calculate only the % according to the whole dataset using this code", *what is different between that result, and an actual solution to the problem*? I can't understand what it is that you actually need help with. – Karl Knechtel Oct 16 '20 at 07:38
  • What I meant was I managed to calculate the frequency of, say, product 1 on all the entries of my dataset but not grouped per countries. 100% was the whole dataset but I wanted 100% to be the number of times the product occurred per country. @jezrael found the solution to my question – Killi Mandjaro Oct 16 '20 at 10:16

1 Answers1

0

Use SeriesGroupBy.value_counts with normalize=True parameter:

df = (data1.groupby('Countries')['Products']
           .value_counts(normalize=True,sort=False)
           .reset_index(name='Percentage'))
print (df)
   Countries   Products  Percentage
0  Country 1  Product 1    0.138889
1  Country 1  Product 2    0.861111
2  Country 2  Product 1    0.666667
3  Country 2  Product 2    0.333333

EDIT:

df = (data1.groupby('Countries')['Products']
           .value_counts(sort=False)
           .reset_index(name='ProductsCount')
           .assign(Percentage = lambda x: x['ProductsCount'].div(len(x))))
print (df)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • What if I want the other columns from the original dataset data1 to show as well? – Killi Mandjaro Oct 16 '20 at 09:59
  • @KilliMandjaro - Added answer, or use your solution form question – jezrael Oct 16 '20 at 10:03
  • Thanks for the update, but I meant in the data1 that I'm actually working with I have other columns not mentioned here. Like Price, Quantity, Gender and so on... How could I also show them? – Killi Mandjaro Oct 16 '20 at 10:45
  • @KilliMandjaro - I think you need [this](https://stackoverflow.com/questions/47360510/pandas-groupby-and-aggregation-output-should-include-all-the-original-columns-i) – jezrael Oct 16 '20 at 10:47
  • Would you be able to explain it? I'm new to python, thus I can't figure out how to implement it for my case – Killi Mandjaro Oct 16 '20 at 12:57
  • @KilliMandjaro - I think the best should be create new question with sample data, expectedoutput and what you try. – jezrael Oct 16 '20 at 12:58