1

I am broadcasting an RDD with collectAsMap. The input RDD size if around 5GB and we apply some filters before collect to map. But boardcasting fails after running for long time. I tried even with 3GB data and it fails. However ever I tried with 100KB of data, succeeds. Any settings that I am missing. I have tried to keep upper limits on driver memory and executor memory. Note: This is spark context broadcast and not spark sql

val data = sc.broadcast(RDD.collect.toMap)
user0712
  • 43
  • 6
  • Does this answer your question? [What is the maximum size for a broadcast object in Spark?](https://stackoverflow.com/questions/41045917/what-is-the-maximum-size-for-a-broadcast-object-in-spark) – mazaneicha Aug 31 '22 at 16:08
  • 1
    The reference talks about broadcast joins. Are these different? One is spark.sql broadcast versus spark context broadcast – user0712 Aug 31 '22 at 16:46
  • You're right. Limits seems to've been set outside of TorrentBroadcast: https://github.com/apache/spark/blob/branch-3.3/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L139 – mazaneicha Aug 31 '22 at 18:40

0 Answers0