Python Pandas Counting the Occurrences of a Specific value

Question

I am trying to find the number of times a certain value appears in one column.

I have made the dataframe with data = pd.DataFrame.from_csv('data/DataSet2.csv')

and now I want to find the number of times something appears in a column. How is this done?

I thought it was the below, where I am looking in the education column and counting the number of time ? occurs.

The code below shows that I am trying to find the number of times 9th appears and the error is what I am getting when I run the code

Code

missing2 = df.education.value_counts()['9th']
print(missing2)

Error

KeyError: '9th'

`df.education.value_counts()['?']` should be correct. What does it give you instead? (show some code and actual output) — joris, Feb 08 '16 at 18:59
Could be generalized as `(df['Column'].values == 'a').sum()` — Wizhi, Jul 12 '20 at 13:41

jezrael · Accepted Answer · 2019-01-16T07:40:56.160

You can create subset of data with your condition and then use shape or len:

print df
  col1 education
0    a       9th
1    b       9th
2    c       8th

print df.education == '9th'
0     True
1     True
2    False
Name: education, dtype: bool

print df[df.education == '9th']
  col1 education
0    a       9th
1    b       9th

print df[df.education == '9th'].shape[0]
2
print len(df[df['education'] == '9th'])
2

Performance is interesting, the fastest solution is compare numpy array and sum:

Code:

import perfplot, string
np.random.seed(123)


def shape(df):
    return df[df.education == 'a'].shape[0]

def len_df(df):
    return len(df[df['education'] == 'a'])

def query_count(df):
    return df.query('education == "a"').education.count()

def sum_mask(df):
    return (df.education == 'a').sum()

def sum_mask_numpy(df):
    return (df.education.values == 'a').sum()

def make_df(n):
    L = list(string.ascii_letters)
    df = pd.DataFrame(np.random.choice(L, size=n), columns=['education'])
    return df

perfplot.show(
    setup=make_df,
    kernels=[shape, len_df, query_count, sum_mask, sum_mask_numpy],
    n_range=[2**k for k in range(2, 25)],
    logx=True,
    logy=True,
    equality_check=False, 
    xlabel='len(df)')

score 37 · Answer 2 · answered Sep 06 '17 at 17:08

Couple of ways using count or sum

In [338]: df
Out[338]:
  col1 education
0    a       9th
1    b       9th
2    c       8th

In [335]: df.loc[df.education == '9th', 'education'].count()
Out[335]: 2

In [336]: (df.education == '9th').sum()
Out[336]: 2

In [337]: df.query('education == "9th"').education.count()
Out[337]: 2

score 25 · Answer 3 · answered Sep 20 '18 at 13:52

An elegant way to count the occurrence of '?' or any symbol in any column, is to use built-in function isin of a dataframe object.

Suppose that we have loaded the 'Automobile' dataset into df object. We do not know which columns contain missing value ('?' symbol), so let do:

df.isin(['?']).sum(axis=0)

DataFrame.isin(values) official document says:

it returns boolean DataFrame showing whether each element in the DataFrame is contained in values

Note that isin accepts an iterable as input, thus we need to pass a list containing the target symbol to this function. df.isin(['?']) will return a boolean dataframe as follows.

    symboling   normalized-losses   make    fuel-type   aspiration-ratio ...
0   False       True                False   False       False
1   False       True                False   False       False
2   False       True                False   False       False
3   False       False               False   False       False
4   False       False               False   False       False
5   False       True                False   False       False
...

To count the number of occurrence of the target symbol in each column, let's take sum over all the rows of the above dataframe by indicating axis=0. The final (truncated) result shows what we expect:

symboling             0
normalized-losses    41
...
bore                  4
stroke                4
compression-ratio     0
horsepower            2
peak-rpm              2
city-mpg              0
highway-mpg           0
price                 4

score 16 · Answer 4 · edited Aug 10 '18 at 19:08

16

Try this:

(df[education]=='9th').sum()

edited Aug 10 '18 at 19:08

mayo

3,845
1
32
42

answered Aug 10 '18 at 18:28

Emmanuel Steiner

161
1
3

5

This could be improved by putting the code in a code block (indent 4 spaces) and with an explanation of what the code is doing. – Jerod Johnson Aug 10 '18 at 18:53
This will return a `pd.Series` with boolean value of `True` for when an entry in the column `df['education']` == `'9th'` and `False` otherwise. `sum()` will count the number of `True` values – ruslaniv Aug 23 '20 at 09:08

score 6 · Answer 5 · answered Apr 22 '19 at 09:08

6

easy but not efficient:

list(df.education).count('9th')

answered Apr 22 '19 at 09:08

keramat

4,328
6
25
38

score 6 · Answer 6 · answered May 25 '21 at 11:02

Simple example to count occurrences (unique values) in a column in Pandas data frame:

import pandas as pd

# URL to .csv file 
data_url = 'https://yoursite.com/Arrests.csv'
# Reading the data 
df = pd.read_csv(data_url, index_col=0)
# pandas count distinct values in column 
df['education'].value_counts()

Outputs:

Education        47516 
9th              41164 
8th              25510 
7th              25198 
6th              25047                       
...  
3rd                 2 
2nd                 2 
1st                 2 
Name: name, Length: 190, dtype: int64

shyam_gupta · Answer 7 · 2021-08-08T12:27:51.590

4

for finding a specific value of a column you can use the code below

irrespective of the preference you can use the any of the method you like

df.col_name.value_counts().Value_you_are_looking_for

take example of the titanic dataset

df.Sex.value_counts().male

this gives a count of all male on the ship Although if you want to count a numerical data then you cannot use the above method because value_counts() is used only with series type of data hence fails So for that you can use the second method example

the second method is

#this is an example method of counting on a data frame
df[(df['Survived']==1)&(df['Sex']=='male')].counts()

this is not that efficient as value_counts() but surely will help if you want to count values of a data frame hope this helps

EDIT -- If you wanna look for something with a space in between

you may use

df.country.count('united states') I believe this should solve the problem

edited Aug 08 '21 at 12:27

answered Jun 10 '20 at 07:08

shyam_gupta

309
2
11

what if the we want to find the count for a entry having spacing, for ex: United States. I tired replacing empty space with underscore, it didn't work. – Jayakrishnan Aug 07 '21 at 22:57
the first method wont work , but the second method will – shyam_gupta Aug 08 '21 at 12:21
for ex:- df[df['country']=='united states'].counts() – shyam_gupta Aug 08 '21 at 12:22

score 0 · Answer 8 · edited Aug 26 '22 at 14:09

0

I think this could be a more easy solution. Suppose you have the following data frame.

DATE        LANG        POSTS
2008-07-01  c#          3
2008-08-01  assembly    8
2008-08-01  javascript  2
2008-08-01  c           85
2008-08-01  python      11
2008-07-01  c#          3
2008-08-01  assembly    8
2008-08-01  javascript  62
2008-08-01  c           85
2008-08-01  python      14

you can find the occurrence of LANG item's sum like this

df.groupby('LANG').sum()

and you will have the sum of each individual language

edited Aug 26 '22 at 14:09

Vitalizzare

4,496
7
13
32

answered Aug 20 '22 at 15:51

Anees Ahmad

3
5

Please format this answer. – nigel239 Aug 24 '22 at 10:55

Python Pandas Counting the Occurrences of a Specific value

8 Answers8

Linked

Related