0

I have a table named passive than contains a list of timestamped events per user. I want to fill the attribute duration, which correspond to the time between the current row's event and the next event done by this user.

I tried the following query:

UPDATE passive as passive1
SET passive1.duration = (
    SELECT min(UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) )
    FROM passive as passive2
    WHERE passive1.user_id = passive2.user_id 
    AND UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) > 0
);

This returns the error message Error 1093 - You can't specify target table for update in FROM.

In order to circumvent this limitation, I tried to follow the structure given in https://stackoverflow.com/a/45498/395857, which uses a nested subquery in the FROM clause to create an implicit temporary table, so that it doesn't count as the same table we're updating:

UPDATE passive 
SET passive.duration = (

    SELECT *
    FROM (SELECT min(UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive.event_time)) 
        FROM passive, passive as passive2
        WHERE passive.user_id = passive2.user_id 
        AND UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) > 0
        )
    AS X
);

However, the passive table in the nested subquery doesn't refer to the same passive as in the main query. Because of that, all rows have the same passive.duration value. How can I refer to the main query's passive in the nested subquery? (or maybe are there some alternative ways to structure such a query?)

Community
  • 1
  • 1
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501

2 Answers2

2

Try Like this....

UPDATE passive as passive1
SET passive1.duration = (
    SELECT min(UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) )
    FROM (SELECT * from passive) Passive2
    WHERE passive1.user_id = passive2.user_id 
    AND UNIX_TIMESTAMP(passive2.event_time) - UNIX_TIMESTAMP(passive1.event_time) > 0
    )
;
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
Amit Singh
  • 8,039
  • 20
  • 29
  • Thanks a lot, it indeed circumvents the error 1093. However, doesn't `(SELECT * from passive)` slower the query? (The table `passive` is 6 GB) Would you have any advice to speed up the query? – Franck Dernoncourt Jun 04 '13 at 04:42
  • @FranckDernoncourt from the query it look like you have multiple entries of same user id so you can try `Select Max(event_time),user_id from passive group by user_id`...i think it will reduce the size of table little bit.... – Amit Singh Jun 04 '13 at 05:13
  • Thanks, `Max(event_time)` would change the results, but `SELECT event_time, user_id from passive) Passive2`indeed reduces a bit the run time (by around 10%). I kind of feel this kind of operation doesn't fit well in SQL. – Franck Dernoncourt Jun 04 '13 at 05:42
  • @FranckDernoncourt are you sure it chnage the result...i mean you tried it.....i think it should work.... – Amit Singh Jun 04 '13 at 05:45
  • Yep I tried but it doesn't work as we want the next event for the user, not the last one. The query I tried was: `UPDATE passive as passive1 SET passive1.duration = ( SELECT min(UNIX_TIMESTAMP(passive2.event_time_max) - UNIX_TIMESTAMP(passive1.event_time) ) FROM (Select Max(event_time) as event_time_max,user_id from passive group by user_id) Passive2 WHERE passive1.user_id = passive2.user_id AND UNIX_TIMESTAMP(passive2.event_time_max) - UNIX_TIMESTAMP(passive1.event_time) > 0 ) WHERE passive1.user_id < 2 ;` – Franck Dernoncourt Jun 04 '13 at 05:59
  • @FranckDernoncourt `Select Max(event_time),user_id from passive a WHERE A.USER_ID – Amit Singh Jun 04 '13 at 06:03
  • I tried `(Select Max(event_time),user_id from passive a WHERE a.user_id < passive1.user_id group by user_id)` but indeed passive1 isn't accessible. Is it possible to force MySQL to cache `(SELECT * from passive)`? – Franck Dernoncourt Jun 04 '13 at 06:07
  • i dont have idea about MySql.....i worked on Ms sql server....might b it possible.....try to find...may you can find a better way than this.... – Amit Singh Jun 04 '13 at 06:17
  • I eventually used a Python script to circumvent the issue, the problem is now solved, thanks for the help! – Franck Dernoncourt Jun 16 '13 at 17:17
0

We can use a Python script to circumvent the issue:

'''
We need an index on user_id, timestamp to speed up
'''

#!/usr/bin/python
# -*- coding: utf-8 -*-

# Download it at http://sourceforge.net/projects/mysql-python/?source=dlp
# Tutorials: http://mysql-python.sourceforge.net/MySQLdb.html
#            http://zetcode.com/db/mysqlpython/
import MySQLdb as mdb 

import datetime, random

def main():
    start = datetime.datetime.now()

    db=MySQLdb.connect(user="root",passwd="password",db="db_name")
    db2=MySQLdb.connect(user="root",passwd="password",db="db_name")

    cursor = db.cursor()
    cursor2 = db2.cursor()

    cursor.execute("SELECT observed_event_id, user_id, observed_event_timestamp FROM observed_events ORDER BY observed_event_timestamp ASC")

    count = 0
    for row in cursor:
        count += 1
        timestamp = row[2]
        user_id = row[1]
        primary_key = row[0]
        sql = 'SELECT observed_event_timestamp FROM observed_events WHERE observed_event_timestamp > "%s" AND user_id = "%s" ORDER BY observed_event_timestamp ASC LIMIT 1' % (timestamp, user_id)
        cursor2.execute(sql)
        duration = 0
        for row2 in cursor2:
            duration = (row2[0] - timestamp).total_seconds()
            if (duration > (60*60)):
                duration = 0
                break

        cursor2.execute("UPDATE observed_events SET observed_event_duration=%s WHERE observed_event_id = %s" % (duration, primary_key))

        if count % 1000 == 0:
            db2.commit()
            print "Percent done: " + str(float(count) / cursor.rowcount * 100) + "%" + " in " + str((datetime.datetime.now() - start).total_seconds()) + " seconds."

    db.close()
    db2.close()
    diff = (datetime.datetime.now() - start).total_seconds()
    print 'finished in %s seconds' % diff

if __name__ == "__main__":
    main()
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501