-1

Task : Let df be a spark data frame. We want to replace a value n in df by NA.

In R I would simply write

df[df==n] <- NA

Problems / questions : (as I am new to Spark any comment is welcome)

  • What is the equivalent in SparkR to NA? I found functions like isNull and isNAN and I am confused if there are some differences.

I was able to do it on one column col1 using ifelse, i.e.

df[[col1]] <- ifelse( df[[col1]] == n, NA, df[[x]])

but I was not able to "parallize" it.

I tried :

df <- spark.lapply(colnamed(df), function(x) {ifelse(df[[x]] == n, NA , df[[x]])})

but I got the message

Job aborted due to stage failure

which I do not understand.

AS Mackay
  • 2,831
  • 9
  • 19
  • 25
Christian
  • 15
  • 1
  • 4

1 Answers1

0

Some solutions that may help troubleshoot that error
Job aborted due to stage failure: Task from application

how-to-handle-null-entries-in-sparkr
Add a column full of NAs in Sparkr

SparkR API

Marc0
  • 181
  • 7
  • Thank you for your answer. BUT beside the first link none is dealing with the problem / task, i.e. 1) how to apply a user defined function in general ? sparkr.lapply is for example mentioned in your link to the Sparkr documentation, so why does my "code" does not work ? where is my lack in understanding ? 2) as for somebody who is familiar with R I thought there might exists a easy solution for such a specific problem in sparkr – Christian Jan 05 '19 at 09:20
  • I don't see 1 being asked anywhere in the question. Read the SparkR API you may find some clues there. – Marc0 Jan 06 '19 at 21:17