0

I have a data like this

  student_id         phase           date
         34          submitted       05-03-2019
         34          review          15-04-2019
         78          submitted       06-12-2018
         34          submitted       25-04-2019
         78          review          01-01-2019
         34          review          08-05-2019

What I want is something like this

student_id     GapDays  submission-date   reviewdate
34             41       05-03-2019        15-04-2019
34             13       25-04-2019        08-05-2019
78             26       06-12-2018        01-01-2019

What i want is the datecolumn difference between submitted and review per student_id

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
mimaku
  • 27
  • 4
  • If you're using MySQL, then please don't tag with sql-server (which is for Microsoft SQL Server);. – Mark Rotteveel Apr 20 '20 at 09:40
  • I think that you want to do a crosstab, like this https://stackoverflow.com/questions/15997090/crosstab-view-in-mysql/51532376 right? – Oscar Gallardo Apr 20 '20 at 09:46
  • 2
    Your data model is not really good : you're missing and `id_project` for project. Student 34 have 4 lines and you can't be sure the `review` is for the good project as you don't have ID. – Blag Apr 20 '20 at 09:54
  • 1
    @mimaku . . . You actually have nothing that distinguishes the different submissions? That seems quite awkward. What if the data is submit-submit-review-review? How do you know which review matches which submission? – Gordon Linoff Apr 20 '20 at 11:22

2 Answers2

0

This would have been much simpler (and more efficient) in MySQL 8.0, where we would use window functions for this.

In earler versions, one option is to use a correlated subquery to retrieve the date of the next review for each submission:

select 
    t.*,
    datediff(review_date, submission_date) gapdays
from (
    select
        student_id,
        date submission_date
        (
            select min(t1.date)
            from mytable t1 
            where 
                t1.student_id = t.student_id 
                and t1.date > t.date
                and t1.phase = 'review'
        ) review_date
    from mytable t
    where t.phase = 'submitted'
) t
order by student_id, submission_date

For performance, consider an index on (student_id, phase, date) (the ordering of the columns in the index is important here).

Demo on DB Fiddle:

| student_id | submission_date | review_date | gapdays |
| ---------- | --------------- | ----------- | ------- |
| 34         | 2019-03-05      | 2019-04-15  | 41      |
| 34         | 2019-04-25      | 2019-05-08  | 13      |
| 78         | 2018-12-06      | 2019-01-01  | 26      |
GMB
  • 216,147
  • 25
  • 84
  • 135
0

With a self join and aggregation:

select 
  t1.student_id, 
  datediff(min(t2.date), t1.date) gapdays,
  t1.date submissiondate,
  min(t2.date) reviewdate
from tablename t1 inner join tablename t2
on t1.student_id = t2.student_id and t1.date <= t2.date
and t1.phase = 'submitted' and t2.phase = 'review' 
group by t1.student_id, t1.phase, t1.date

If you also want in the results the submissions that are not yet reviewed you can change the inner join to a left join.
See the demo.
Results:

> student_id | gapdays | submissiondate | reviewdate
> ---------: | ------: | :------------- | :---------
>         34 |      41 | 2019-03-05     | 2019-04-15
>         34 |      13 | 2019-04-25     | 2019-05-08
>         78 |      26 | 2018-12-06     | 2019-01-01
forpas
  • 160,666
  • 10
  • 38
  • 76