I have 2 table like this:
Table A: ind , value1
(1,a)(2,b)(3,c) ... 3 billion rows
Table B: start_ind,end_ind,value2
(3,20,c)(78,99,y)(88,156,z) ... 500 million rows
And i write this query in spark sql
Select /*+ BROADCAST(B) */ A.ind,A.value1,B.value2
From A join B on A.ind between B.start_ind and B.end_ind;
I give 5 executor cores , 700 num of executor, 50gb memory to executor .
The query never ends
How can i improve the performance