1

I'm fairly new to this. I'm trying to figure out how to calculate the percentage of elementName that are True/False after a droupby command. Instead of count, I need percent.

I'd appreciate all kind of help) He're how my data looks:

comp isB element FY

1750 .     false         62          62             
           true          305         305        
1800       false         52          52         
           true          356         356    
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
A1danka
  • 13
  • 3

2 Answers2

0

You could just use .mean(), since numpy casts booleans to integers during that operation.

In [17]: import pandas as pd

In [18]: import numpy as np

In [19]: df = pd.DataFrame({'a': np.random.choice([True, False], size=10),
                            'b': np.random.choice(['x', 'y'], size=10)})

In [20]: df
Out[20]: 
       a  b
0  False  x
1   True  y
2  False  y
3   True  x
4   True  y
5  False  y
6  False  x
7  False  y
8   True  x
9   True  y

In [21]: df.groupby(['b']).mean()
Out[21]: 
     a
b     
x  0.5
y  0.5

Daniel Severo
  • 1,768
  • 2
  • 15
  • 22
0
# Print original DataFrame
>>> df

    comp    isB     element FY
0   1750    False   62      62
1   1750    True    305     305
2   1800    False   52      52
3   1800    True    356     356

# Sum number of elements
>>> df['total_count'] = df.groupby('comp').transform(sum)['element']
>>> df

    comp    isB     element FY  total_count
0   1750    False   62      62      367
1   1750    True    305     305     367
2   1800    False   52      52      408
3   1800    True    356     356     408

# Calculate fraction or percent according to preference
>>> df['fraction'] = df['element'] / df['total_count']
>>> df['percent'] = df['fraction'] * 100
>>> df

    comp    isB     element FY  total_count fraction    percent
0   1750    False   62      62  367         0.168937    16.893733
1   1750    True    305     305 367         0.831063    83.106267
2   1800    False   52      52  408         0.127451    12.745098
3   1800    True    356     356 408         0.872549    87.254902

# Get series using group-by
>>> df.groupby(['comp', 'isB'])['percent'].max()

      comp     isB  
1750  False    16.893733
      True     83.106267
1800  False    12.745098
      True     87.254902
Name: percent, dtype: float64
ulmefors
  • 516
  • 3
  • 11
  • is there a way to transform string elements? since my 'element' column is made up of many string variables, I am getting an error message "cannot reshape array of size 2448 into shape (408,7)". – A1danka Mar 12 '19 at 15:25
  • Please post a new question, following the advice here: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – ulmefors Mar 13 '19 at 06:58