I want to group users with their IPs (or something else). When IP(data)/user_id table is given such as in the below example, start with the user_id using the first data point (data = 1, users = (a,b,c)). Then gather other data values used by those users (users = (a,b,c), used_data = (2,4,5)). This continues until all users and data linked this way are discovered.
Example data (CSV, I substituted IP with random data to make it easier to read)
data,user_id
1,a
1,b
1,c
2,a
2,e
3,d
3,h
4,a
5,b
5,f
5,g
6,h
6,i
In short, i want to gather users who use the same data at least once.
Expected output in CSV,
group,data,user_id
1,[1,2,4,5],[a,b,c,e,f,g]
2,[3,6],[d,h,i]