0

I've just started learning Spark and there are quite a few things that have come up that alarm me. One of the simplest ones is that it seems there are Spark streaming properties that they don't make public on their Spark Streaming Configurations documentation.

I came across one such property while looking into a timeout exception that actually took down my block manager and left my receiver running (which is an insane behavior that I haven't figured out yet). Another user described the same exception here. I found this website going over some of the configurations mentioned there which are missing from Spark's documentation.

Here are the (super secret) properties that the accepted answer suggested checking out:

spark.streaming.driver.writeAheadLog.allowBatching true 
spark.streaming.driver.writeAheadLog.batchingTimeout 15000

Why are these other properties not documented? I've heard this is a common thing in Spark. Is that true?

b15
  • 2,101
  • 3
  • 28
  • 46

1 Answers1

1

I can't tell you if it is "true", but only share my experience that I did not come across many undocumented parts in Spark.

What helped me a lot is also the GitHub book The Internals of Apache Spark by Jacek Laskowski.

As Spark is open source, you always have the chance to:

  • contribute missing documentation parts
  • scan the source code (whereas looking for something you don't know exists in advance will be challenging)
Michael Heil
  • 16,250
  • 3
  • 42
  • 77