19

I know there have been a few posts related to this, but my case is a little bit different and I wanted to get some help on this.

I need to pull some data out of the database that is a cumulative count of interactions by day. currently this is what i have

SELECT
   e.Date AS e_date,
   count(e.ID) AS num_interactions
FROM example AS e
JOIN example e1 ON e1.Date <= e.Date
GROUP BY e.Date;

The output of this is close to what I want but not exactly what I need.

The problem I'm having is the dates are stored with the hour minute and second that the interaction happened, so the group by is not grouping days together.

This is what the output looks like.

enter image description here

On 12-23 theres 5 interactions but its not grouped because the time stamp is different. So I need to find a way to ignore the timestamp and just look at the day.

If I try GROUP BY DAY(e.Date) it groups the data by the day only (i.e everything that happened on the 1st of any month is grouped into one row) and the output is not what I want at all.

enter image description here

GROUP BY DAY(e.Date), MONTH(e.Date) is splitting it up by month and the day of the month, but again the count is off.

enter image description here

I'm not a MySQL expert at all so I'm puzzled on what i'm missing

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
John Ruddell
  • 25,283
  • 6
  • 57
  • 86

2 Answers2

23

New Answer

At first, I didn't understand you were trying to do a running total. Here is how that would look:

SET @runningTotal = 0;
SELECT 
    e_date,
    num_interactions,
    @runningTotal := @runningTotal + totals.num_interactions AS runningTotal
FROM
(SELECT 
    DATE(eDate) AS e_date,
    COUNT(*) AS num_interactions
FROM example AS e
GROUP BY DATE(e.Date)) totals
ORDER BY e_date;

Original Answer

You could be getting duplicates because of your join. Maybe e1 has more than one match for some rows which is inflating your count. Either that or the comparison in your join is also comparing the seconds, which is not what you expect.

Anyhow, instead of chopping the datetime field into days and months, just strip the time from it. Here is how you do that.

SELECT
   DATE(e.Date) AS e_date,
   count(e.ID) AS num_interactions
FROM example AS e
JOIN example e1 ON DATE(e1.Date) <= DATE(e.Date)
GROUP BY DATE(e.Date);
clhereistian
  • 1,261
  • 1
  • 11
  • 19
  • thank you for your response. stripping out the time is what i was wanting to do. I'm not getting the right results still so I think it has to do with the join. what I want is if the count is 3 on the first day and then 4 on the second day for it to display 7 for the second day(the cumulative amount of the previous with the current). the count is inflated still with joining by DATE() http://screencast.com/t/qcsRcZt7Ws4z... theres only 5 interactions on 2013-12-23.. so I guess i need to look into a different way to accomplish this. – John Ruddell Mar 09 '14 at 02:57
  • So the problem with the join is its matching each record with all possible ones. aka.. if theres 5 instances then it matches 1 with all 5 then the next one with all 5 so the count is inflated to 25 instead of 5. any thoughts as to how to fix this? – John Ruddell Mar 09 '14 at 03:41
  • 1
    Sorry, I didn't understand that you were trying to do a running total. I'll add to my answer – clhereistian Mar 09 '14 at 19:43
  • 1
    This is great, since it will work well even with large data sets – El Gucs Sep 29 '16 at 00:04
  • 2
    Great answer @clhereistian (the table is read just once)! Thanks, it helps me a lot. Tip: if you're using an ORM (like Doctrine in PHP) that can accepts just one query per execution : replace the `SET @runningTotal = 0;` by `@runningTotal := 0,` in the subquery :) (This tip can also helps you/me to make multi-cumulative sum with UNION between queries) – didier2l Jul 30 '18 at 10:11
  • Thank you very much, that accelerated my previous subquery-based solution by far – shadowhorst May 16 '19 at 11:26
14

I figured out what I needed to do last night... but since I'm new to this I couldn't post it then... what I did that worked was this:

SELECT
   DATE(e.Date) AS e_date,
   count(e.ID) AS num_daily_interactions,
   (
      SELECT 
         COUNT(id)
      FROM example 
      WHERE DATE(Date) <= e_date
   ) as total_interactions_per_day
FROM example AS e
GROUP BY e_date;

Would that be less efficient than your query? I may just do the calculation in python after pulling out the count per day if its more efficient, because this will be on the scale of thousands to hundred of thousands of rows returned.

Mailerdaimon
  • 6,003
  • 3
  • 35
  • 46
John Ruddell
  • 25,283
  • 6
  • 57
  • 86
  • 1
    Your query looks good. Just make sure your date comparison is getting what you expect since you are comparing a date type to a datetime type. – clhereistian Mar 10 '14 at 05:09
  • the response is what i want. i just am a little hesitant because it will be slow as this table grows. again thanks for everything.. very helpful – John Ruddell Mar 10 '14 at 05:17
  • I'd try EXPLAINing both queries. I feel this one will run the subquery for each date, which shouldn't be much slow if we have the proper index, but @clhereistian's does a single group by and then just iterates over it, which may be better (in particular if - it was my case - you have extra, non-indexed filtering conditions). – chesterbr Jun 27 '14 at 14:59
  • @chesterbr I ended up changing this query because it was taking a while. if you look at my more recent questions I ask a question about optimizing it.. ended up going with two queries and just doing the cumulative count in python... the query started taking 14-15 seconds to run when comparing 300,000 records to itself.. thats 90,000,000,000 rows to check.. its indexed and optimized just still takes a bit with that much to compare. – John Ruddell Jun 27 '14 at 15:37