0

In R, the is.na() function returns a dataset where Null values are true, while Not Null values are false:

col1 col2
Null 1
1 Null
Null Null
1 1

is.na() -->

col1 col2
True False
False True
True True
False False

I'm wondering if there is an equivalent pyspark function that returns the dataframe, populated with True/False values, I do not want to use pyspark filter/where as that will not return the full dataset.

Thanks in advance!

PS: If my formatting is off, please let me know, this is my first stack overflow post so not 100% sure how the formatting works

kxxlxn
  • 1
  • Possibly relevant: https://stackoverflow.com/q/37262762/3358272 and https://sparkbyexamples.com/pyspark/pyspark-filter-rows-with-null-values/ – r2evans Jun 29 '21 at 13:12
  • 1
    Thanks @r2evans, that's not quite what I was looking for - as those both filter records out of the dataset, instead I found solution: `df.withColumn('col1', when(df['col1'].isNull(), lit('True')).otherwise(lit('False')))` and repeating for col2 --> This returns the whole dataset with any null values replaced with True, and any non-null values replaced with False – kxxlxn Jun 30 '21 at 15:29

0 Answers0