1

I am using Spark 2.2. I have a join query on a partitioning column and also have some filter conditions on other columns. So, when I checked the execution plan, it looks like below.

  1. It checks for non-null partition columns.

  2. It applies predicates on entire table even before joining with second table. This is causing Spark to read/apply filters on all partitions, then join to get data. My join clause actually hits only one partition.

Why does my query need to scan all partitions? Is there any way to control predicates push down in Spark when doing joins?

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
vijay
  • 11
  • 1
  • Does this answer your question? [How to prevent predicate pushdown?](https://stackoverflow.com/questions/50336355/how-to-prevent-predicate-pushdown) – Rayan Ral Jun 05 '20 at 05:55

0 Answers0