0

I'm facing an issue where I see the following error message - basically around a null:

An error occurred while calling o4013.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 275.0 failed 4 times, most recent failure: Lost task 0.3 in stage 275.0 (TID 415, w-----pp.net, executor 1): scala.MatchError: null (of class org.json.JSONObject$Null)

So what I'm doing is first gathering data from my DB - its a object hence the long select:

myData = results.select("music.metadata.artist.*")

then:

print(myData.select("*").show())

Based on that error I'm assuming there is some null data coming in, so to remove it I tried placing the following line before I do the show()

myData.na.drop()

However that doesn't help and I continue getting the same error.

Other than that, how can I precisely see what data I have incoming when I set myData?

Otherwise, am I actually on the right track based on that error message?

Any help/ideas would be appreciated.

Thanks.

userMod2
  • 8,312
  • 13
  • 63
  • 115
  • 1
    Please [edit] your question and provide the context (preferably with a [reproducible example](https://stackoverflow.com/q/48427185/10938362)). – user10938362 Jun 21 '19 at 08:59
  • Did you assign it back? `myData.na.drop()` does not operate in place. You need `myData = myData.na.drop()`. – pault Jun 21 '19 at 11:54

1 Answers1

1

The error comes when you try to extract myData from results, as you can see from its text. I would presume that you have a mistake in the schema. To understand why the error comes when you call show, we need to look at the transformation-action dichotomy in Spark.

Stuff you do with Spark can be divided into transformations and actions. Basically, actions are things that would allow you to see the actual results of what you've been doing, like show, saving to disk and collecting to the driver.

Transformations are everything else, and in particular, select statements. Until an action is processed, transformations just stack up, which leads to the error only being raised when show is called, despite the possibility that it was caused by an earlier transformation.

As an aside, you don't need to print the result of show, which is None.

gmds
  • 19,325
  • 4
  • 32
  • 58
  • Thanks for that info - I actually got it working by updating my actual SQL query at the very start - as you pointed out. – userMod2 Jun 24 '19 at 01:31