Check if a column is all empty

Question

I have a column name and a dataframe. I want to check if all values in that column are empty and if it is empty drop the column from the dataframe.

What i did was checked the count of the column with non null values and if count equals 0 drop the column but that seems like an expensive operation in pyspark

See if [this](https://stackoverflow.com/questions/44627386/how-to-find-count-of-null-and-nan-values-for-each-column-in-a-pyspark-dataframe?rq=1) or [this one](https://stackoverflow.com/questions/37262762/filter-pyspark-dataframe-column-with-none-value) helps. — Rex5, Aug 09 '19 at 03:33

score 0 · Answer 1 · answered Aug 11 '19 at 12:05

The way you are doing it is the right way. Regarding performance you might want to use caching on your dataframe (if it fits into memory).
Also consider doing the operation on a subset (or even only the first row) of your dataframe first in order to find columns that are definitely not always null. This should reduce the number of columns you have to check on full data

Check if a column is all empty

1 Answers1