I want to randomly shuffle the values for one single column of a table based on a groupby. E.g., I have two columns A and B. Now, I want to randomly shuffle column B based on a groupby on A.
For an example, suppose that there are three distinct values in A. Now for each distinct value of A, I want to shuffle the values in B, but just with values having the same A.
Example input:
A B C
-------------------
1 1 x
1 3 a
2 4 c
3 6 d
1 2 a
3 5 v
Example output:
A B C
------------------
1 3 x
1 2 a
2 4 c
3 6 d
1 1 a
3 5 v
In this case, for A=1
the values for B got shuffled. The same happened for A=2
, but as there is only one row it stayed like it was. For A=3
by chance the values for B also stayed like they were. The values for column C stay as they are.
Maybe this can be solved by using window functions, but I am unsure how exactly.
As a side note: This should be achieved in Google's BigQuery.