An elegant way to count the occurrence of '?'
or any symbol in any column, is to use built-in function isin
of a dataframe object.
Suppose that we have loaded the 'Automobile' dataset into df
object.
We do not know which columns contain missing value ('?'
symbol), so let do:
df.isin(['?']).sum(axis=0)
DataFrame.isin(values)
official document says:
it returns boolean DataFrame showing whether each element in the DataFrame
is contained in values
Note that isin
accepts an iterable as input, thus we need to pass a list containing the target symbol to this function. df.isin(['?'])
will return a boolean dataframe as follows.
symboling normalized-losses make fuel-type aspiration-ratio ...
0 False True False False False
1 False True False False False
2 False True False False False
3 False False False False False
4 False False False False False
5 False True False False False
...
To count the number of occurrence of the target symbol in each column, let's take sum
over all the rows of the above dataframe by indicating axis=0
.
The final (truncated) result shows what we expect:
symboling 0
normalized-losses 41
...
bore 4
stroke 4
compression-ratio 0
horsepower 2
peak-rpm 2
city-mpg 0
highway-mpg 0
price 4