Below is an analogic schema
Table 1 - Players (Upto 1M players per team, HUGE)
player_number - INT
team_id - bigint (indexed)
Table 2 - Team (Relatively small table which I want to broadcast)
team_id - bigInt (indexed)
team_size - INT
Below is my query which is using sort-merge join which is becoming slow
select id as teamId , teamSize from teams
left outer join (select teamId, count(1) as size from players group by teamId) as players
on (teams.id= players.teamId) where players.size != teams.teamSize;
The table players can be quite huge. There can be as many as 1M players in a team. The table players are partitioned by team_id so the above query can run by itself on a single partition.
How do I broadcast teams table which is relatively smaller to use broadcast join instead of sort-merge join.