I am working with a big dataset (2M+ rows) that looks like the following:
Id TeamId UserId
43 504 722
44 504 727
45 601 300
46 602 722
47 602 727
48 605 300
49 777 300
50 777 301
51 788 400
52 789 400
53 100 727
In this case, TeamId 504 and 602 are the same, 601 matches with 605 but not with 777 (because it has one more person in the team).
My goal is to generate unique IDs for each "unique" team:
Id TeamId UserId UniqueId
43 504 722 0
44 504 727 0
45 601 300 1
46 602 722 0
47 602 727 0
48 605 300 1
49 777 300 2
50 777 301 2
51 788 400 3
52 789 400 3
53 100 727 4
A person can be in a team of 1, like in the case of UserId 727: he's part of team 504 (with UserId 722) and of team 100 (alone). This should generate 2 different unique ids for the two teams.
I cannot groupBy by TeamId only as it will detect TeamId 504 and 602 as different teams, nor I can by UserId because it will not keep track of the teams.
From my understanding, this might be a network problem. I have found a similar query to this here: Groupby two column values and create a unique id
How can I achieve this? Any help would be appreciated.