0

I am currently trying to understand the concept of Salt-ing to counter Skew. Unfortunately, I couldn't find enough information which would help me wrap my head around the concept of Salting in the context of aggregations (E.g. Group By & Window etc) in Spark SQL.

So far I have construed that salted aggregations require 2 passes. Hence, I put together the following code snippet that would represent the first pass. However, I am unable to proceed from there on. Can someone please help me proceed with few examples using Spark SQL queries ?

Pass I:

create temporary view salt1 
as
select cust, item, cast(rand() * 10 as int) as salt
from tab1;

create temporary view salt2
as
select cust, item
from (select cust, 
             item, 
             row_number() over (partition by salt, cust order by purch) as row_num
     from salt1
     )
where row_num = 1;

Thanks for any help.

Matthew
  • 315
  • 3
  • 5
  • 16
  • Does this help? https://stackoverflow.com/questions/58110207/spark-how-does-salting-work-in-dealing-with-skewed-data – shay__ Oct 01 '20 at 19:40
  • I have read this question. However, I want to know how to apply the same thru SQL queries.. – Matthew Oct 02 '20 at 04:42

0 Answers0