What is the limit for broadcast of RDD

Asked Aug 31 '22 at 15:50

Active Sep 01 '22 at 15:52

Viewed 60 times

I am broadcasting an RDD with collectAsMap. The input RDD size if around 5GB and we apply some filters before collect to map. But boardcasting fails after running for long time. I tried even with 3GB data and it fails. However ever I tried with 100KB of data, succeeds. Any settings that I am missing. I have tried to keep upper limits on driver memory and executor memory. Note: This is spark context broadcast and not spark sql

val data = sc.broadcast(RDD.collect.toMap)

edited Sep 01 '22 at 15:52

asked Aug 31 '22 at 15:50

user0712

Does this answer your question? [What is the maximum size for a broadcast object in Spark?](https://stackoverflow.com/questions/41045917/what-is-the-maximum-size-for-a-broadcast-object-in-spark) – mazaneicha Aug 31 '22 at 16:08
1

The reference talks about broadcast joins. Are these different? One is spark.sql broadcast versus spark context broadcast – user0712 Aug 31 '22 at 16:46
You're right. Limits seems to've been set outside of TorrentBroadcast: https://github.com/apache/spark/blob/branch-3.3/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L139 – mazaneicha Aug 31 '22 at 18:40

What is the limit for broadcast of RDD

0 Answers0