0

Ok so I have a table with, amongst other, columns: name, c_id, date.

Certain entries in that table obey a certain criteria, which I can effectively select using WHERE, so this is not a problem.

What I would like, then, is a way to group by, in addition to the name and c_id columns, a group of three dates: date - 1, date, and date + 1. In other words, I want each row of the output to represent all entries that have the same name and c_id as a certain relevant entry and which happened between the day before and the day after that entry, including itself.

How would I go about doing that?

--EDIT:

(EDIT2: The origin table is supposed to be an INNER JOIN of Table1 and Table2 ON Table1.id = Table2.id)

Sample data:

Table1:
  id  |  c_id  |     date     |  other stuff
-----------------------------------------------------
  01  |  abc   |  2015/12/09  |  whatever
  02  |  abc   |  2015/12/09  |  whatever
  03  |  abc   |  2015/12/10  |  relevant criterion
  04  |  abc   |  2015/12/11  |  whatever
  05  |  def   |  2015/11/15  |  whatever
  06  |  def   |  2015/11/16  |  relevant criterion
  07  |  abc   |  2015/11/17  |  whatever
  08  |  mnc   |  2016/01/02  |  whatever
  09  |  mnc   |  2016/01/02  |  whatever
  10  |  mnc   |  2016/01/03  |  whatever
  11  |  mnc   |  2016/01/03  |  whatever
  12  |  mnc   |  2016/01/03  |  whatever
  13  |  mnc   |  2016/01/04  |  relevant criterion
  14  |  mnc   |  2016/01/05  |  whatever
  15  |  mnc   |  2016/01/05  |  whatever
  16  |  mnc   |  2016/01/06  |  whatever

Table2:
  id  |  Name  |  other stuff
--------------------------------------
  01  |  John  |  whatever
  02  |  John  |  whatever
  03  |  John  |  whatever
  04  |  John  |  whatever
  05  |  Mary  |  whatever
  06  |  Mary  |  whatever
  07  |  Mary  |  whatever
  08  |  Alice |  whatever
  09  |  Alice |  whatever
  10  |  Alice |  whatever
  11  |  Alice |  whatever
  12  |  Alice |  whatever
  13  |  Alice |  whatever
  14  |  Alice |  whatever
  15  |  Alice |  whatever
  16  |  Alice |  whatever

Sample desired output:

  Name  |  c_id  |  pivot_date  |  count
------------------------------------------
  John  |  abc   |  2015/12/10  |  4
  Mary  |  def   |  2015/11/16  |  2
  Alice |  mnc   |  2016/01/04  |  6

(The pivot_date part is not particularly necessarily the one with the relevant criterion, any one of the dates involved are good.)

Pedro Carvalho
  • 565
  • 1
  • 6
  • 26
  • 1
    Can you post the SQL you have tried so far? – Haymaker Jan 06 '16 at 12:55
  • A `GROUP BY` without an aggregate is pretty much useless. Are you sure you want to `GROUP BY`? - In any case, you can use a self join to match the records you want. – JimmyB Jan 06 '16 at 12:55
  • I'm not sure I want to group by! I'm sure about the outputs I want, but I don't particularly favour any way of getting them. I'm not sure how the self join would find matching records? The entries I'm looking for will typically have a few dozen entries between the previous and the next day. (And @Haymaker: I haven't tried anything yet, I do not even know how to begin!) – Pedro Carvalho Jan 06 '16 at 12:57
  • `SELECT ... FROM myTable t1 INNER JOIN myTable t2 ON t1.name = t2.name AND t1.c_id = t2.c_id AND t2.date >= t1.date-1 AND t2.date <= t1.date+1` – JimmyB Jan 06 '16 at 12:59
  • Please edit your question and provide sample data and desired results. – Gordon Linoff Jan 06 '16 at 13:14
  • Added sample data and sample desired output. – Pedro Carvalho Jan 06 '16 at 13:34
  • are you assuming a set of 4 day date ranges? and you want a count of any rows that fall into those ranges? do those date ranges start on a specific date? (which date?) will you provide the date ranges or do you expect these to be calculated? – Paul Maxwell Jan 06 '16 at 14:56
  • I am not assuming 4 day date ranges: I want /all/ rows with the same name-c_id pair that happened within a day of one such row that fulfills a certain criterion. I want a count of the number of rows that fall within those ranges, but that is not all I want, but the other things I want are easy enough to get if I have that count. They do not start on a specific date. I expect them to be calculated. – Pedro Carvalho Jan 06 '16 at 14:59
  • I do not understand, sorry. – Paul Maxwell Jan 06 '16 at 15:05
  • Which part do you want me to elaborate on? Does the modified Alice example in the original question help any? – Pedro Carvalho Jan 06 '16 at 15:07
  • Suppose John has consecutive dates from the 9th through the 12th. Do you want that split as 9th-11th;12th or 9th;10th-12th? This is just one example and a big part of the problem is defining how to partition the dates into blocks. – shawnt00 Jan 06 '16 at 17:16
  • Any possible split works, I have no preferences about any of them. I have a slight preference about non-repetition (i.e. if the 11th for John on a given c_id is in a block it oughtn't be in another), but I would be okay if there was a significantly easier solution that meant repetitions would happen, especially because I don't expect the relevant criterion to exist in dates too close to each other. – Pedro Carvalho Jan 06 '16 at 17:22
  • Here's a fiddle with the two tables set up: http://sqlfiddle.com/#!9/0a80b – shawnt00 Jan 06 '16 at 18:36

4 Answers4

0

Updated for new sample data:

  SELECT t.name, t.c_id, t.date pivot_date, COUNT(*) count
    FROM record t
    JOIN record t2
      ON t2.name = t.name
     AND t2.c_id = t.c_id
     AND t2.date >= t.date - INTERVAL 1 DAY        
     AND t2.date <= t.date + INTERVAL 1 DAY
   WHERE t.other_stuff = 'relevant criterion'
GROUP BY t.name, t.c_id, t.date

See SQLFiddle

--

Updated for new sample data:

  SELECT t2.name, t1.c_id, t1.date pivot_date, COUNT(*) count
    FROM table1 t1
    JOIN table1 to1
      ON to1.c_id = t1.c_id
     AND to1.date >= t1.date - INTERVAL 1 DAY        
     AND to1.date <= t1.date + INTERVAL 1 DAY
    JOIN table2 
      ON t2.id = t1.id
   WHERE t1.other_stuff = 'relevant criterion'
GROUP BY t2.name, t1.c_id, t1.date
Arth
  • 12,789
  • 5
  • 37
  • 69
  • What if record is (Table1 tk INNER JOIN Table2 tm ON tk.id = tm.id)? Just replacing it gives me a syntax error. – Pedro Carvalho Jan 06 '16 at 13:24
  • @PedroCarvalho that should be (SELECT * FROM Table1 tk INNER JOIN Table2 tm ON tk.id = tm.id), but you may not need all of it. – Arth Jan 06 '16 at 13:54
  • That gets rid of the syntax error but gets me a "Duplicate column c_id" error. – Pedro Carvalho Jan 06 '16 at 14:34
  • @PedroCarvalho Your sample data is all in one table, my answer should work for your sample data.. if you want more help you'll have to update your sample data. – Arth Jan 06 '16 at 16:29
  • How do I change my sample data to reflect that the table is generated from an INNER JOIN of two tables? I'm just saying that there isn't a table called "record" from which I'm selecting this stuff, that "record" is in fact an INNER JOIN of two tables, and that replacing the word "record" with that INNER JOIN is not enough. – Pedro Carvalho Jan 06 '16 at 16:43
  • @PedroCarvalho I'm not sure how I can be any more obvious.. Provide sample data that mimics your ORIGINAL data in the ORIGINAL form of two tables! – Arth Jan 06 '16 at 16:46
  • Updated the sample data. – Pedro Carvalho Jan 06 '16 at 17:05
0

This is a way:

@dt := '2015-12-10'

SELECT Name , c_id, MAX(`date`) maxdate, COUNT(*) countof
FROM table1
WHERE date >= @dt - INTERVAL 1 DAY  AND `date` < @dt  + INTERVAL 2 DAY
GROUP BY Name , c_id

Note the date column could be MIN(date) as an alternative

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
0

I had to build this in Oracle using analytic LAG/LEAD functions. I don't know if you can simulate those functions in MYSQL (look here for simulation)

table1 is your data

with 
relevant as
(
select *
  from (
    select 
      nm,
      c_id,
      stuff,
      dt,
      lag(dt,1) over ( order by nm, dt, c_id) prev,
      lead(dt,1) over ( order by nm, dt, c_id) nxt
    from table1 )
where stuff = 'relevant criterion'
)
select 
    d.nm,
    d.c_id,
    to_char(d.dt,'DD-MON-YYY') dt,
    d.stuff,
    count(*) over (partition by d.nm, d.c_id) cnt
  from table1 d
where ( d.dt in ( select rp.prev from relevant rp where d.nm = rp.nm and d.c_id = rp.c_id ) 
        OR
        d.dt in ( select rn.nxt from relevant rn where d.nm = rn.nm and d.c_id = rn.c_id  )
        OR
        d.dt in ( select rn.dt from relevant rn where d.nm = rn.nm and d.c_id = rn.c_id  )
      )
;

Results:

NM  C_ID    DT  STUFF   CNT
Alice   mnc 03-JAN-016  whatever    6
Alice   mnc 03-JAN-016  whatever    6
Alice   mnc 03-JAN-016  whatever    6
Alice   mnc 05-JAN-016  whatever    6
Alice   mnc 04-JAN-016  relevant criterion  6
Alice   mnc 05-JAN-016  whatever    6
John    abc 11-DEC-015  whatever    4
John    abc 10-DEC-015  relevant criterion  4
John    abc 09-DEC-015  whatever    4
John    abc 09-DEC-015  whatever    4
Mary    def 15-NOV-015  whatever    2
Mary    def 16-NOV-015  relevant criterion  2

You can filter above results by creating this query as a view and adding where stuff = 'relevant criterion' filter criteria

NM  C_ID    DT  STUFF   CNT
Alice   mnc 04-JAN-016  relevant criterion  6
John    abc 10-DEC-015  relevant criterion  4
Mary    def 16-NOV-015  relevant criterion  2
Community
  • 1
  • 1
mevdiven
  • 1,902
  • 2
  • 17
  • 33
0

Here's a naive way to do it. It sort of works:

select t2.Name, t1.c_id, min(t1.Date) as pivot_date, count(*) as cnt
from Table1 t1 inner join Table2 t2 on t2.id = t1.id
where <criteria...>
group by t2.Name, t1.c_id, datediff(t1.Date, '2015-01-01') div 3

This approach arbitrarily divides the calendar into three day block based on a starting point.

It would be possible to have consecutive dates that form the end of the first block and the beginning of the next. I doubt you want that and I'll see if I can find something that gives a better balance.

How complicated are the filtering criteria that you left out?

shawnt00
  • 16,443
  • 3
  • 17
  • 22