Here is a simulation of the basic setup i have: each person can hold multiple possessions.
Persons table:
id name
1 Carl
2 Sam
3 Tom
4 Jack
Possessions table:
possession personId
car 2
shoes 2
shovel 2
tent 3
matches 3
axe 4
I want to generate a random set of possessions belonging to a random set of people, one possession per person.
So, in a non-SQL world I would generate a set of N random people and then pick a random possession for each person in the set. But how do I implement that in SQL semantics?
I thought of getting a random sample of possessions with some variation of:
SELECT * FROM Posessions WHERE 0.01 >= RAND()
And then filtering out duplicate persons, but that is no good as it will favor persons with large number of possessions in the end, and I want each person to have equal chance of being selected.
Is there a canonical way to solve this?
P.S. Person contains ~50000 entities and Possession contains ~2500000 entities, but i only need to perform this sampling once, so it can be somewhat slow.