First, I would also index the uri
column and make each column "not nullable":
CREATE TABLE IF NOT EXISTS `uri_history` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user` int(10) unsigned NOT NULL, /* cannot be NULL */
`timestamp` int(10) unsigned NOT NULL, /* cannot be NULL */
`uri` varchar(255) NOT NULL, /* cannot be NULL */
PRIMARY KEY (`id`),
KEY `user` (`user`),
KEY `timestamp` (`timestamp`),
KEY `uri`
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
However, I am a bit bewildered by your timestamp
column having an int(10)
definition and values such as 1389223839. I would expect an integer timestamp to be a value created with the UNIX_TIMESTAMP
function call, but 1389223839 would then represent a value of '2014-01-08 18:30:39' for the 'America/New_York' time zone. I would have expected a sample timestamp to be more "contemporary." But I will have to assume that this column is a Unix timestamp value.
Let's say I was interested in gathering statistics for the month of June of this year:
SELECT * FROM uri_history
WHERE DATE(FROM_UNIXTIME(`timestamp`)) between '2022-06-01' and '2022-06-30'
ORDER BY `uri`, `user`, `timestamp`
From this point on I would process the returned rows in sequence recognizing breaks on the uri
and user
columns. For any returned uri
and user
combination, it should be very simple to compare the successive timestamp values and see if they differ by at least 30 minutes (i.e. 1800 seconds). In Python this would look like:
current_uri = None
current_user = None
current_timestamp = None
counter = None
# Process each returned row:
for row in returned_rows:
uri = row['uri']
user = row['user']
timestamp = row['timestamp']
if uri != current_uri:
# We have a new `uri` column:
if current_uri:
# Statistics for previous uri:
print(f'Visits for uri {current_uri} = {counter}')
current_uri = uri
current_user = user
counter = 1
elif user != current_user:
# We have a new user for the current uri:
current_user = user
counter += 1
elif timestamp - current_timestamp >= 1800:
# New visit is at least 30 minutes after the user's
# previous visit for this uri:
counter += 1
current_timestamp = timestamp
# Output final statistics, if any:
if current_uri:
print(f'Visits for uri {current_uri} = {counter}