0

I am a very beginner of python and pandas. My dataset has ? , which is not NaN or null. I want to count how many ? are on certain columns.

I tried to use count value_counts() or other count functions but it did not work. I want to count how many ? there are in workclass column. Thanks

enter image description here

I would like to know the way without using scikitlearn or other ML library.

Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52

2 Answers2

0

You can use isin function for this. Here is an example.

arr = {'col1': ['?',2,'?',4], 'col2': [6,7,8,9]}
df = pd.DataFrame(arr)
df.isin(['?']).sum()
output:
col1    2
col2    0
Sifat Haque
  • 5,357
  • 1
  • 16
  • 23
0

Your first task should be to replace all columns with ? to NaN or None so that you can use built-in Pandas functions to easily count them.

import pandas as pd
import numpy as np

data = {'number': [1, 2, 3, 4, '?', 5],
        'string': ['a', 'b', 'c', 'd', '?', 'e']
       }

df = pd.DataFrame(data)

df['number'] = df['number'].replace('?', np.NaN)
df['string'] = df['string'].replace('?', None)

Now you can count the number of missing values.

df.isna().sum()

Output:

number    1
string    1
dtype: int64
Stu Sztukowski
  • 10,597
  • 1
  • 12
  • 21