I loop through a number of messages with an unixtime as well as an userid, where I want to find the number of messages inside a 24 hr time slot for each user. I posted my code on codereview to get some help. From there I optimized the
cur.execute('SELECT unixtime FROM MessageType1 WHERE userID ='+str(userID[index])+' ORDER BY unixtime asc')
query as I found that it compromized of 6.7s of the total 7.2s running time of my code. I optimized it by making indexes on the unixtime column using CREATE INDEX unixtime_times ON MessageType1 (unixtime asc)
. Now that query takes 0.00117s opposed to 6.7s, but the overall time it takes to run my code has gone from 7.2s to 15.8s. Everything is unchanged except for the indexes. Upon further inspection, it seems like messages = cur.fetchall()
is taking 15.3s after implementing indexes. Anyone have clue why? Thanks in advance!
con = lite.connect(databasepath)
userID = []
messages = []
messageFrequency = []
with con:
cur = con.cursor()
#Get all UserID
cur.execute('SELECT DISTINCT userid FROM MessageType1')
userID = cur.fetchall()
userID = {index:x[0] for index,x in enumerate(userID)}
#For each UserID
for index in range(len(userID)):
messageFrequency.append(0)
#Get all MSG with UserID = UserID sorted by UNIXTIME
cur.execute('SELECT unixtime FROM MessageType1 WHERE userID ='+str(userID[index])+' ORDER BY unixtime asc')
messages = cur.fetchall()
messages = {index:x[0] for index,x in enumerate(messages)}
#Loop through every MSG
for messageIndex in range(len(messages)):
frequency = 0
message = messages[messageIndex]
for nextMessageIndex in range(messageIndex+1, len(messages)):
#Loop through every message that is within 24 hours
nextmessage = messages[nextMessageIndex]
if nextmessage < message+(24*60*60):
#Count the number of occurences
frequency += 1
else:
break
#Add best benchmark for every message to a list that should be plotted.
if messageFrequency[-1]<frequency:
messageFrequency[-1] = frequency