0

How do I count all the cells under 'Category' with Pandas Python? I tried: df['Category'].value_counts() but that gives me this output:

Engineering & Information Technology        1159
Manufacturing                               1044
Vehicle Service                              915
Supply Chain                                 378
Energy - Solar & Storage                     374
Construction & Facilities                    296
Sales & Customer Support                     269
Finance                                      119
Charging                                     115
Environmental, Health & Safety                93
Autopilot & Robotics                          78
Operations & Business Support                 75
HR                                            64
Design                                        59
Vehicle Software                              40
Legal & Government Affairs                    18
External Relations & Employee Experience       2
Name: Category, dtype: int64

This output just gives me a breakdown of the number of occurrences of each label in 'Category'. What I want is simply the total number of occurrences under 'Category'. So essentially I want to add up all the numbers in the right column. How do I do that?

Here is what the original data looks like (all text):

Title   Category    Location
0   Technical Product Analyst   Engineering & Information Technology    Draper, Utah
1   Software Engineer   Engineering & Information Technology    Austin, Texas
2   Software Development Engineer   Engineering & Information Technology    Fremont, California
3   Global Supply Analyst   Supply Chain    Palo Alto, California
4   Software Support Engineer, Battery Automation ...   Engineering & Information Technology    Austin, Texas
Michelle
  • 25
  • 3

2 Answers2

0

df.Category.sum() should do the trick.

Igor Rivin
  • 4,632
  • 2
  • 23
  • 35
  • I thought that would do it. But it spits out: " 'Engineering & Information TechnologyEngineering & Information TechnologyEngineering & Information TechnologySupply ChainEngineering & Information TechnologyEngineering & Information TechnologyManufacturingManufacturingSupply ChainEnergy - Solar & StorageVehicle ServiceVehicle SoftwareEnergy - Solar & StorageManufacturingVehicle ServiceManufacturingVehicle ServiceFinanceVehicle ServiceEnvironmental, Health & SafetyEngineering & Information TechnologyEnvironmental, Health & SafetyManufacturingManufacturingSales & Customer SupportEnergy ..." – Michelle Dec 25 '22 at 03:57
0

Example

we need reproducible and minimal example for answer. lets make

data = [['A', 'upper'], ['c', 'lower'], ['d', 'lower'], ['A', 'upper'], 
        ['B', 'upper'], ['e', 'lower'], ['d', 'lower'], ['d', 'lower']]
df = pd.DataFrame(data, columns=['col1', 'category'])

df

    col1    category
0   A       upper
1   c       lower
2   d       lower
3   A       upper
4   B       upper
5   e       lower
6   d       lower
7   d       lower

Code

out = df.groupby('category')['col1'].agg(pd.Series.nunique)

out

category
lower    3                 <-- c, d, e
upper    2                 <-- A, B
Name: col1, dtype: int64
Panda Kim
  • 6,246
  • 2
  • 12