I have 2 tables:
db1.main_table (32 GB)
db2.lookup_table (2.5 KB)
lookup table has just one column named id, which is also present and is the primary key of main_ table. The goal is to just look at the values in lookup table and delete from main_table all the rows which has these values present. I am, using this hive query(on TEZ) which has suddenly started creating a cross product at Reduce stage.
insert overwrite table
db1.main_table
select * from db1.main_table where nvl(id,'NV') not in (select nvl(id,'RV') from db2.lookup_table);
I am using nvl due to the presence of null values in my main table for id column which I don't want to loose.
My query is getting hung up at Reducer 2(only 3 containers) forever.
I am getting this warning for Reducer 2
INFO : Warning: Shuffle Join MERGEJOIN[34][tables = [$hdt$_0, $hdt$_1]] in Stage 'Reducer 2' is a cross product
And I am getting the following plan for this query which gets hanged at Reducer 2 vertex of TEZ.
Can we please suggest a way by which this Reducer 2 can get more containers or a way to solve this very long running job. Solution would be highly appreciated.