1

I encountered a kryo buffer overflow exception, but I really don't understand what data could require more than the current buffer size. I already have spark.kryoserializer.buffer.max set to 256Mb, and even a toString applied on the dataset items, which should be much bigger than what kryo requires, take less than that (per item).

I know I can increase this parameter, and I will right now, but I don't think this is a good practice to simply increase resources when reaching a bound without investigating what happens (same as if I get an OOM and simply increase ram allocation without checking what takes more ram)

=> So, is there a way to investigate what is put in the buffer along the spark dag execution?

I couldn't find anything in the spark ui.

Note that How Kryo serializer allocates buffer in Spark is not the same question. It ask how it works (and actually no one answers it), and I ask how to investigate. In the above question, all answers discuss the parameters to use, I know which param to use and I do manage to avoid the exception by increasing the parameters. However, I already consume too much ram, and need to optimize it, kryo buffer included.

Juh_
  • 14,628
  • 8
  • 59
  • 92

1 Answers1

0

All data that is sent over the network or written to the disk or persisted in the memory should be serialized along with the spark dag. Hence, Kryo serialization buffer must be larger than any object you attempt to serialize and must be less than 2048m.

https://spark.apache.org/docs/latest/tuning.html#data-serialization

  • Is there a way to look at the size of the largest object in the Spark application UI? how else could we find out the actual size of the largest object? – Omkar Neogi Jul 22 '22 at 06:55
  • Thanks for the answer but it doesn't answer my question. Maybe I wasn't clear, but the size of all objects that I think could be serialized are much less than the current buffer max size. Than means I don't know what is serialized, and this troubles me. As Omkar asked, knowing the biggest object would already be a great help – Juh_ Jul 27 '22 at 13:11