Chrome History
SQlite 3 database has the urls
table like this:
CREATE TABLE "urls"(
id INTEGER PRIMARY KEY AUTOINCREMENT,
url LONGVARCHAR,
title LONGVARCHAR,
visit_count INTEGER DEFAULT 0 NOT NULL,
typed_count INTEGER DEFAULT 0 NOT NULL,
last_visit_time INTEGER NOT NULL,
hidden INTEGER DEFAULT 0 NOT NULL
)
If you also use Chrome and OS X, you can find it at ~/Library/Application Support/Google/Chrome/Default/History
. Other system can see this post: Where does Chrome save its SQLite database to?
I want to count the nubmers of title
but in separated time span(like a day).
The simplified table is like below:
id | title | last_visit_time |
--------+-------+-------------------+
"240255" "foo" "13197393082183934"
"240256" "foo" "13197393082183934"
"240259" "foo" "13197393103031880"
"251615" "foo" "13201270874907168"
"251616" "foo" "13201270904026269"
"251631" "foo" "13201271376777156"
"258228" "goo" "13203150937196411"
"258229" "goo" "13203150937196411"
"258230" "goo" "13203150937196411"
The last_visit_time
is "formatted as the number of microseconds since January, 1601".
To a given span
(like 86400000000, a day in microseconds), all same title
only count 1 time, because they are in a same period.
The part of pseudocode to count for a distinct title
in Python style is like:
count = 0
all_last_visit_time_sorted = all_last_visit_time.sort() # generally it should be sorted because the `history` data was added into database in time order.
span = 86400000000 # a day in microseconds
current_base_time = all_last_visit_time_sorted[0]
for last_visit_time in all_last_visit_time_sorted:
if last_visit_time - current_base_time > span:
count += 1
current_base_time = last_visit_time
To be more clear, the partition is like:
id | title | last_visit_time |
--------+-------+-------------------+
"240255" "foo" "13197393082183934"
"240256" "foo" "13197393082183934"
"240259" "foo" "13197393103031880"
------------------------------------
"251615" "foo" "13201270874907168"
"251616" "foo" "13201270904026269"
"251631" "foo" "13201271376777156"
------------------------------------
"258228" "goo" "13203150937196411"
"258229" "goo" "13203150937196411"
"258230" "goo" "13203150937196411"
Then this is what I expect:
title | count |
-------+--------+
"foo" | 2 |
"goo" | 1 |
I don't know how to do these with SQL although after google such keywords as window function
,subquery
and so on.
p.s: data sample
INSERT INTO urls VALUES ("460534","https://stackoverflow.com/questions/ask","Ask a Question - Stack Overflow","1","0","13197393082183934","0");
INSERT INTO urls VALUES ("460535","https://stackoverflow.com/questions/ask/wizard","Ask a Question - Stack Overflow","1","0","13197393082183965","0");
INSERT INTO urls VALUES ("460536","https://stackoverflow.com/questions/ask?guided=true","Ask a Question - Stack Overflow","1","0","13197393082184036","0");
INSERT INTO urls VALUES ("460537","https://stackoverflow.com/search?q=%5Bhadoop%5D","Highest Voted 'hadoop' Questions - Stack Overflow","1","0","13196783835451652","0");
INSERT INTO urls VALUES ("460538","https://stackoverflow.com/questions/tagged/hadoop","Highest Voted 'hadoop' Questions - Stack Overflow","1","0","13196783835452653","0");
INSERT INTO urls VALUES ("460539","https://stackoverflow.com/questions/ask","Ask a Question - Stack Overflow","1","0","13407393082183934","0");
INSERT INTO urls VALUES ("460540","https://stackoverflow.com/questions/ask/wizard","Ask a Question - Stack Overflow","1","0","13407393082184035","0");
INSERT INTO urls VALUES ("460541","https://stackoverflow.com/questions/ask?guided=true","Ask a Question - Stack Overflow","1","0","13407393082184236","0");