Currently hive does support non equi join. But as the cross product becomes pretty huge, I was wondering what are the options to tackle a large fact(257 billion rows, 37 tb) and relatively smaller(8.7 gb) dimension table join.
In case of equi join I can make it work easily with proper bucketing on the join column/columns . (using same number of buckets for SMBM practically converting to a map join). But if we think this wont be of any advantage when its a non equi join, because the values will be there in other buckets, practically triggering a shuffle i.e. a reduce phase.
If any one has any thoughts to overcome this, please suggest .....