0

Got a csv file with a few columns and the data contains null values for certain column. using pandas dataframe function, how can I Print the total number of columns that contain null values & null column names to a output csv file?

Output.csv 2 # no of columns Column a Column b

tommy2o
  • 13
  • 2
  • can you add a sample dataframe and the expected output too? thanks – anky Jun 09 '19 at 17:42
  • item no, city, state, total_sales expected output to a csv file. 2 # no of columns Column a Column c – tommy2o Jun 09 '19 at 18:10
  • no no no,, you should definately check this for any future questions: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – anky Jun 09 '19 at 18:11

3 Answers3

1

I prepared the test data as follows:

np.random.seed(0)
df = pd.DataFrame(np.random.random(size=(5, 10)), columns=list('ABCDEFGHIJ'))
df[df > 0.9] = pd.np.nan; df

To get column names containig NaN values, run:

nn = df.isnull().any()

For my test data, the result is:

A     True
B    False
C    False
D     True
E    False
F    False
G    False
H     True
I     True
J    False
dtype: bool

We are actually interested in index values where the value is True. To get them, run:

nullCols = nn.index[nn].tolist()

The result is:

['A', 'D', 'H', 'I']

And to get the number of such columns, run:

len(nullCols)

The result is 4.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
0

Try:

pd.DataFrame({'Columns': a.columns[a.isnull().any()], 'Count':len([x for x in a.isnull().any().values if x == True])}).to_csv('myfilepath.csv')

Where a is your dataframe name, and change any() to all() is you want to check whether or not the whole column is empty or not.

saph_top
  • 677
  • 1
  • 6
  • 23
0

Here is the code that will help you attain the above:

    df=pd.DataFrame({'Name':["abc","def",None],'Age':[1,None,3],'Address':["rst","uvw","xyz"]})

    null_colname=df.columns[df.isnull().any()].tolist() #find columns which returns True for null testing and convert the column names to list
    null_colnum=len(null_colname)                       # take length of the above list

    p=str(null_colnum)+"# of columns:"                  # initialize string in the format of required output
    for i in range(0,null_colnum):                      #iterate over the list
        p=p+'Column-'+null_colname[i]+' '               # concatenate column names to the string


    text_file = open(filepath+"Output.csv", "w")        #export to csv
    text_file.write("%s" % p)
    text_file.close()
  • Thank you Rimil. How to adjust the output.csv so it outputs # of columns in a row and the column names in each row separately.? – tommy2o Jun 09 '19 at 22:12
  • As per my understanding, you want the first row to be the # of columns and the respective rows be the column names. You can do it it following ways: `code`list_col=list(null_colname) text_file = open(filepath+"Output.csv", "w") #export to csv text_file.write(p) text_file.write("\n") for col_name in list_col: text_file.write(col_name) text_file.write("\n") text_file.close() `code` or you can convert this to dataframe 'code' df_final=pd.DataFrame() df_final[p]=list_col df_final.to_csv(filepath+"\\output.csv",index=False) `code` – RIMIL HEMBROM Jun 12 '19 at 05:57