0

I am creating a multi-tenant application where for particular event when fired by a user, I save the event start time in the database (SQLite). To determine the peak request time, I am trying to find the mode of the timestamps which are saved. Not to be confused with the average, which is going to give me an average of all timestamps - I am looking for a way to find a range like result which reflects the peak. Eg - between 2PM - 4PM, most of the events are fired. Timestamps are stored as string values in YYYY-MM-DDTHH:NN:SS format.

I am having problems writing down a query which helps solves this.

Rohan
  • 871
  • 1
  • 16
  • 32
  • Have you searched SO? http://stackoverflow.com/questions/15763965/how-can-i-calculate-the-median-of-values-in-sqlite – Smutje Mar 03 '14 at 09:10
  • @Smutje, yes I've had a look at that. The response to that question is - sort the elements, find the ones in the middle and then calculate the average of the two. What I am looking for instead is not an average, but a range of lets say 1-2 hours. – Rohan Mar 03 '14 at 09:36
  • 1
    Rohan, I guess what you are trying to find here is Mode, but that's just terminology part and immaterial if your requirement is conveyed to others. – Dipendu Paul Mar 03 '14 at 09:47
  • @DipenduPaul you are right.. I've amended the questions. Thanks :) – Rohan Mar 03 '14 at 09:56
  • I think that the question was Median of date,did you change the question? – Hamidreza Mar 03 '14 at 10:58
  • @Hamidreza, yes I had changed it from Median to Mode – Rohan Mar 03 '14 at 11:33

3 Answers3

1

The algorithm should be as follows:

  1. Choose the duration of range say 1 Hour or 2 Hour
  2. For each timestamp determine which range it belongs. For example, if you have selected 1 hour range and a timestamp is 5:09 pm, then it belongs to 5:00pm - 6:00pm range.
  3. Group by range and count the number of timestamps falling in each time range.
  4. Select the maximum time range

Here is a sample query with which you can accomplish finding mode of the timestamps when range duration is 1 hour:

SELECT Range AS RangeStart, Range+1 AS RangeEnd FROM
(
   SELECT STRFTIME('%H',starttime) AS Range, COUNT(*) As cnt FROM Events
   GROUP BY STRFTIME('%H',starttime)
)
ORDER BY cnt DESC 
LIMIT 1;

You can check the code at SQL Fiddle

Let me know if it resolved your issue.

Dipendu Paul
  • 2,685
  • 1
  • 23
  • 20
1

In SQLite you can use something like this query to get median of DATETIME:

SELECT DATETIME(CAST(AVG(CAST(strftime("%s",T) as int)) AS TEXT),'unixepoch') median FROM
(SELECT TEST T FROM TABLE1 ORDER BY TEST
LIMIT 2 - (SELECT COUNT(*) FROM TABLE1) % 2
OFFSET ((SELECT COUNT(*) FROM TABLE1) / 2) - 
       (1 - (SELECT COUNT(*) FROM TABLE1) % 2))T2;

SQL Fiddle

Hamidreza
  • 3,038
  • 1
  • 18
  • 15
  • Thanks for your inputs. I was actually looking to find the mode and not the median. Apologies for the confusion. – Rohan Mar 03 '14 at 11:35
1

Here's what I did to solve the problem:-

  1. I created another column which stored just the hour of the timestamp.

  2. Ran the following query.

    SELECT EventHour, COUNT(*) AS CountOfHours FROM EventTable GROUP BY EventHour ORDER BY CountOfHours DESC LIMIT 1

  3. This gives the range of the peak traffic hour for the event

Rohan
  • 871
  • 1
  • 16
  • 32
  • +1) because of trying to solve the problem yourself. – Hamidreza Mar 03 '14 at 12:00
  • gr8! my suggestion would be not to create an additional column in the datatable, instead try calculating the hour in an inner `SELECT` query – Dipendu Paul Mar 03 '14 at 12:14
  • @DipenduPaul I though about doing so as well using the strftime function, however, given the number of rows (>10000) and that the timestamp field needs to be stored as TEXT/String in SQLite (as there is no DateTime/TimeStamp field).. i thought it would be much better in terms of performance if I could store the hour as a number and then run the query i've mentioned above.. – Rohan Mar 03 '14 at 12:46