Task :
Let df be a spark data frame. We want to replace a value n
in df
by NA
.
In R I would simply write
df[df==n] <- NA
Problems / questions : (as I am new to Spark any comment is welcome)
- What is the equivalent in SparkR to
NA
? I found functions likeisNull
andisNAN
and I am confused if there are some differences.
I was able to do it on one column col1
using ifelse
, i.e.
df[[col1]] <- ifelse( df[[col1]] == n, NA, df[[x]])
but I was not able to "parallize" it.
I tried :
df <- spark.lapply(colnamed(df), function(x) {ifelse(df[[x]] == n, NA , df[[x]])})
but I got the message
Job aborted due to stage failure
which I do not understand.