11

Working on a script to collect users browser history with time stamps ( educational setting). Firefox 3 history is kept in a sqlite file, and stamps are in UNIX epoch time... getting them and converting to readable format via a SQL command in python is pretty straightforward:

sql_select = """ SELECT datetime(moz_historyvisits.visit_date/1000000,'unixepoch','localtime'), 
                        moz_places.url 
                 FROM moz_places, moz_historyvisits 
                 WHERE moz_places.id = moz_historyvisits.place_id
             """
get_hist = list(cursor.execute (sql_select))

Chrome also stores history in a sqlite file.. but it's history time stamp is apparently formatted as the number of microseconds since midnight UTC of 1 January 1601....

How can this timestamp be converted to a readable format as in the Firefox example (like 2010-01-23 11:22:09)? I am writing the script with python 2.5.x ( the version on OS X 10.5 ), and importing sqlite3 module....

Jason Coon
  • 17,601
  • 10
  • 42
  • 50
cit
  • 2,465
  • 5
  • 28
  • 36
  • so to be clear, you just want to convert a datetime field returned from the recordset to a readable string? – Jason Coon Jan 26 '10 at 18:14
  • That's correct..that bit in the Firefox example gives me back a stamp like: 2010-01-23 11:22:09 ... Looking to do the same for Chrome history logs... – cit Jan 26 '10 at 18:25
  • Relevant question: https://stackoverflow.com/questions/20458406/what-is-the-format-of-chromes-timestamps – GDP2 Nov 24 '21 at 01:59

4 Answers4

16

Try this:

sql_select = """ SELECT datetime(last_visit_time/1000000-11644473600,'unixepoch','localtime'),
                        url 
                 FROM urls
                 ORDER BY last_visit_time DESC
             """
get_hist = list(cursor.execute (sql_select))

Or something along those lines

seems to be working for me.

Squiqqly
  • 161
  • 1
  • 4
  • Why `11644473600`? It is working, but what does the big number mean? – polym Aug 23 '14 at 21:31
  • Ok I've found what the number means. It's the "number of seconds between 1 Jan 1970 and 1 Jan 1601", according to this site: http://timothycomeau.info/chrome-history/ – polym Aug 23 '14 at 21:54
4

This is a more pythonic and memory-friendly way to do what you described (by the way, thanks for the initial code!):

#!/usr/bin/env python

import os
import datetime
import sqlite3
import opster
from itertools import izip

SQL_TIME = 'SELECT time FROM info'
SQL_URL  = 'SELECT c0url FROM pages_content'

def date_from_webkit(webkit_timestamp):
    epoch_start = datetime.datetime(1601,1,1)
    delta = datetime.timedelta(microseconds=int(webkit_timestamp))
    return epoch_start + delta

@opster.command()
def import_history(*paths):
    for path in paths:
        assert os.path.exists(path)
        c = sqlite3.connect(path)
        times = (row[0] for row in c.execute(SQL_TIME))
        urls  = (row[0] for row in c.execute(SQL_URL))
        for timestamp, url in izip(times, urls):
            date_time = date_from_webkit(timestamp)
            print date_time, url
        c.close()

if __name__=='__main__':
    opster.dispatch()

The script can be used this way:

$ ./chrome-tools.py import-history ~/.config/chromium/Default/History* > history.txt

Of course Opster can be thrown out but seems handy to me :-)

Andy Mikhailenko
  • 1,571
  • 1
  • 11
  • 10
  • P.S.: I would now recommend Argh instead of Opster. – Andy Mikhailenko Sep 03 '11 at 18:41
  • Awesome - thanks for the time saver :o) Here's an updated Python script, based on yours, that works with Chrome/Chromium version 38. I've only tested this on MacOS, but it should work on Linux and Windows as well. https://gist.github.com/boris-chervenkov/84b7c3f06bb94c106ba8 – Boris Chervenkov Jan 06 '15 at 14:30
1

The sqlite module returns datetime objects for datetime fields, which have a format method for printing readable strings called strftime.

You can do something like this once you have the recordset:

for record in get_hist:
  date_string = record[0].strftime("%Y-%m-%d %H:%M:%S")
  url = record[1]
Jason Coon
  • 17,601
  • 10
  • 42
  • 50
  • I guess I am looking for the lazy man's way out -- hoping to just replace the 'unixepoch' bit in the code for Firefox so it'll work with Chrome... If Chrome used unixepoch I'd be all set and I could reuse the line ( first changing the moz stuff to Chrome's tables)..I'm just stuck on how to convert that time format ( I have also seen it referred to as 'WebKit' time format.)... – cit Jan 26 '10 at 18:57
  • 1
    So write a function to convert the date. Then you can do `date = get_webkit_date(record[0])`. That *is* the lazy man's way out. – jcdyer Jan 26 '10 at 19:05
  • There's also a strptime if you want to go the other way (string to datetime object). See the [python datetime docs](http://docs.python.org/library/datetime.html) for more details on how to do that. – jcdyer Jan 26 '10 at 19:06
  • Correction: You probably want the .fromtimestamp() method – jcdyer Jan 26 '10 at 19:08
  • I guess I have no idea what the function would look like to convert to WebKit time ... Is there not something built into sqlite that can be called like the unixepoch modifier to do this? I am not very familiar with sqlite3 -- It looks like .fromtimestamp() wouldn't work since it would only handle dates back to 1970? – cit Jan 26 '10 at 19:29
1

This may not be the most Pythonic code in the world, but here's a solution: Cheated by adjusting for time zone (EST here) by doing this:

utctime = datetime.datetime(1601,1,1) + datetime.timedelta(microseconds = ms, hours =-5)

Here's the function : It assumes that the Chrome history file has been copied from another account into /Users/someuser/Documents/tmp/Chrome/History

def getcr():
    connection = sqlite3.connect('/Users/someuser/Documents/tmp/Chrome/History')
    cursor = connection.cursor()
    get_time = list(cursor.execute("""SELECT last_visit_time FROM urls"""))
    get_url = list(cursor.execute("""SELECT url from urls"""))
    stripped_time = []
    crf = open ('/Users/someuser/Documents/tmp/cr/cr_hist.txt','w' )
    itr = iter(get_time)
    itr2 = iter(get_url)

    while True:
        try:
            newdate = str(itr.next())
            stripped1 = newdate.strip(' (),L')
            ms = int(stripped1)
            utctime = datetime.datetime(1601,1,1) + datetime.timedelta(microseconds = ms, hours =-5)
            stripped_time.append(str(utctime))
            newurl = str(itr2.next())
            stripped_url = newurl.strip(' ()')
            stripped_time.append(str(stripped_url))
            crf.write('\n')
            crf.write(str(utctime))
            crf.write('\n')
            crf.write(str(newurl))
            crf.write('\n')
            crf.write('\n')
            crf.write('********* Next Entry *********') 
            crf.write('\n')
        except StopIteration:
            break

    crf.close()            

    shutil.copy('/Users/someuser/Documents/tmp/cr/cr_hist.txt' , '/Users/parent/Documents/Chrome_History_Logs')
    os.rename('/Users/someuser/Documents/Chrome_History_Logs/cr_hist.txt','/Users/someuser/Documents/Chrome_History_Logs/%s.txt' % formatdate)
cit
  • 2,465
  • 5
  • 28
  • 36