FInding same values on a list (or tupple?), group them together, check the rest of elements

Question

I have an excel file that I'm taking data from. The data is basically a table of a day, and people playing together

day	pl1	pl2
Mon	1000	1011
Tue	1100	0101
Tue	1000	0121
Wed	0101	1101
Mon	0210	1212
Wed	1000	0101

I want to check that no id is playing more than once(regardless if it is in column of pl1 or pl2) in one day. For example , in Wed, we have "0101" two times, one as pl1, and one in pl1, and I want to catch this.

And I'm looking of which would be the fastest and more pythonic way.

I have thought of checking all elements of pl1 list, and pl2 and if I find same value anywhere, check if the value on the column of the day, is same, or not. However, not only I think this would be extremely slow to process, I also think is more complicate to check
The other thought is to move them in lists of lists [[Mon,1000,1012],[Tue,1110,0101]...] and group them by day and then, check also the rest of elements? Still seems that too much time.
Should I create a list of touples instead ? (I already use tupples and check that there is no identical pairs (same id and in same position pl1 an pl2)

Is there any fastest and more compact way ?

Thanks

RNGHatesMe · Answer 1 · 2021-12-18T01:51:02.503

I like this question, it's a nice little brain teaser!

There are probably module tools out there in pandas that can do something sql-like with a self join or the like, but I wanted to see if it could easily be done with just core Python.

The first way that occurred to me was to create a list of tuples, where each tuple is a pair of values, the first being the day and the 2nd being one of the players. Each line of values would then generate 2 tuples, one for player 1 and one for player 2. I.e., the first line of data:

Mon,1000,1011

would be put into 2 tuples:

('Mon',1000),('Mon',1011)

Add all of those tuples to a list. Then search the list for duplicates, which is a common enough task. Assuming the player data was in a csv file called "players.csv" like this:

day,pl1,pl2
Mon,1000,1011
Tue,1100,0101
Tue,1000,0121
Wed,0101,1101
Mon,0210,1212
Wed,1000,0101

the code below should return the information you are looking for:

# import data
playerFile = open('players.csv','r')
playerData = playerFile.readlines()[1:]
playerFile.close()

# create list of tuples, 1 tuple for each player in line of data
plays = []
for line in playerData:
    day,p1,p2 = line[:-1].split(',')
    plays.extend([(day,p1),(day,p2)])

# now check the list for duplicate tuples
# since each tuple is a day that a player played, 
# if a player played twice on 1 day, it'll be a duplicate
seen = set()
dupes = set()

for play in plays:
    if play in seen:
        dupes.add(play)
    else:
        seen.add(play)
        
# print duplicates
for dupe in dupes:
    print(dupe)

Output:

('Wed', '0101')

(Note that I had to use tuples instead of lists for each "play", since I wanted to only identify each duplicate once, i.e. if player 0101 played 3 times on one day, I only wanted one item. A set does that nicely, but you can't add lists to sets since they are mutable)

Wow, nice idea ! For my beginner, self-taught mind, this is very understandable concept ! :) It makes sense ! And I should have thought of this, because I already use tupples so I can check that no pairs plays together more than once, in the same position ! Thank you !!! — ToErotimatiko, Dec 18 '21 at 13:05
I like questions like these because they are very open-ended, and there are many perfectly fine ways to approach them. I challenged my son to look at it (without seeing my approach) to see what he would do, and he suggested creating a list of players for each day, then checking each day for duplicates. However, he agreed that this approach was better as there was only a single list to check for duplicates. I'm originally self-taught myself ;-). In some ways I think it's better as it leaves us freer to original approaches! — RNGHatesMe, Dec 18 '21 at 13:46
Also, this question is very "database"y, and would lend itself well to database type solutions. If you are going to be dealing with this type of data a lot, you should look into getting familiar with SQL. — RNGHatesMe, Dec 18 '21 at 13:51

FInding same values on a list (or tupple?), group them together, check the rest of elements

1 Answers1