mysql subquery and python sets

Question

so I run my mysql scripts through a python wrapper, the important parts are here though. My goal through this was to see which was faster, to make two separate select DISTINCT() calls and find the ones the ones in list a that are not in list b. I also did this using subqueries.

I found that the subqueries were much faster (as hoped) however, the results it yielded were wrong.

starttime1 = time.time()
LoLs.cursor.execute("SELECT DISTINCT(Id) FROM participants")
drop = cursor.fetchall()
drop = [x[0] for x in drop]
LoLs.cursor.execute("SELECT DISTINCT(Id) FROM history")
exist = cursor.fetchall()
exist = [x[0] for x in exist]
drop1 = list(set(drop)-set(exist))
endtime1 = time.time()
print len(drop1)

print endtime1 - starttime1

this results in:

545594
205.518273115

as expected.

However:

starttime2 = time.time()
cursor.execute("SELECT DISTINCT(Id) FROM participants where Id NOT IN (SELECT DISTINCT(Id) FROM history)")
drop2 = cursor.fetchall()
drop2 = [x[0] for x in drop2]
drop2 = list(set(drop2))
endtime2 = time.time()
print len(drop2)

print endtime2 - starttime2

results in:

427333
185.753734112

It's much faster, awesome, but it results in only 427333 entries instead of 545594.

as a note i did a set difference for the two end lists print len(list(set(drop1)-set(drop2)))

results in: 118261

so there are 118261 unique entries in the first list that are not in the second. I even took some of these and checked if they were in the history table manually, and they were, so the subquery should've caught them.

NOTE March 6th: changed:

 cursor.execute("SELECT DISTINCT(Id) FROM participants where summonerId NOT IN (SELECT DISTINCT(Id) FROM history)")

to

 cursor.execute("SELECT DISTINCT(Id) FROM participants where Id NOT IN (SELECT DISTINCT(Id) FROM history)")

to accurately reflect the real problem.

score 1 · Answer 1 · edited May 23 '17 at 12:07

1

NOT IN queries remove nulls. Try instead:

cursor.execute("""SELECT DISTINCT(Id) 
                  FROM participants p
                  WHERE NOT EXISTS (SELECT 1
                                    FROM history h
                                    WHERE h.Id = p.summonerId)""")

edited May 23 '17 at 12:07

Community

1
1

answered Mar 05 '16 at 00:46

mechanical_meat

163,903
24
228
223

still the same issue because both id's are required non null. so nulls don't exist – Matt Mar 06 '16 at 15:57

mysql subquery and python sets

1 Answers1