0

I have a dataframe called Votes and I am trying to figure out how many missing values are in the data frame represented as '?'. The Data Frame has header=none so the column titles are listed as 0, 1, 2 etc. I used the following code:

Empty = Votes.loc[:,:] == '?' 

to find the missing data, and then sum up the column tallies to create a boolean to find missing data. I would like to sum the column tallies that indicate the missing value to see how many missing values are in the dataframe. I used the code:

sum(Empty.sum())

to get the total but was unable to do it by column. How could I find the total by summing the column tallies?

Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Greg Sullivan
  • 173
  • 2
  • 10
  • 2
    Not flagging because is not exact dups, but these 2 could get you in the right direction: https://stackoverflow.com/questions/32589829/how-to-get-value-counts-for-multiple-columns-at-once-in-pandas-dataframe ; https://stackoverflow.com/questions/43172116/pandas-count-some-value-in-all-columns – Tomerikoo Mar 21 '20 at 21:42
  • From the links that @Tomerikoo shared, I would do something like this: import pandas as pd votes = pd.DataFrame({0 : [3,5,'?',7], 1 : ['?',4,'?',12], 2 : [1,'?','?',7]}) votes.apply(pd.value_counts).loc['?'].astype(int) – David Erickson Mar 21 '20 at 22:18

1 Answers1

1

Pretty much just do what you tried doing but do it for each column not the whole DataFrame. A dict might be a good way to organize it:

import pandas as pd

df = pd.DataFrame({
    0:[1,2,'?',4,5,'?',7],
    1:['?',2,'?',4,'?',6,7],
    2:['?',2,'?',4,5,'?','?'],
})

empty = {c:(df[c] == '?').sum() for c in df.columns}

for k,v in empty.items():
    print(f'Column {k} has a total of {v} missing values.')

Example In Python Tutor

Phillyclause89
  • 674
  • 4
  • 12