0

I have a table with a list of registers where each one represents an event, I must group this registers that have happened in the same day. Also I must group the events in the day, grouping by id_fertilizer and calibrado, but I can't group the first result with the last ones.

my SQL result is something like this:

    work_date     id_fertilizer calibrado area   begin_date    end_date
[1] '2014-07-22'  43            NULL      0      "07:03:42.0"  "07:08:00.0"
[2] '2014-07-22'  49            NULL      0      "07:08:52.0"  "07:44:04.0"
[3] '2014-07-22'  49            true      54101  "07:49:41.0"  "12:00:05.0"
[4] '2014-07-22'  49            true      4893   "12:00:30.0"  "14:06:13.0"
[5] '2014-07-22'  43            NULL      0      "14:06:51.0"  "14:49:30.0"
[6] '2014-07-22'  43            NULL      12397  "14:50:04.0"  "16:12:03"

where I must group the lines 3 and 4 in one line and the lines 5 and 6 in another one, taking the min() begin_date and max() end_date and sum() the area
lines 1 and 2 are independent lines. At the end I must have 4 lines: 1, 2, (3+4), (5+6)

The result I've got is something like this (using the function window), but it's wrong, because it have grouped the lines 1,5,6 in one line:

work_date                id_fertilizer  calibrado  area  begin_date    end_date
"2014-07-22 00:00:00.0"  43             NULL       1     "07:03:42.0"  "16:12:03.0"
"2014-07-22 00:00:00.0"  49             NULL       0     "07:08:52.0"  "07:44:04.0"
"2014-07-22 00:00:00.0"  49             true       5     "07:49:41.0"  "14:06:13.0"

My result, with 3 lines says the guys worked from 07:03:42 to 16:12:03 with id_fertilizer 43, but also says he worked from 07:08:52 07:44:04 with id_fertilizer 49. It does not make sense, I must respect the chronological order of the events. So, the result I am expecting is this:

      work_date     id_fertilizer  calibrado  area    begin_date    end_date
[1]   '2014-07-22'  43             NULL       0       "07:03:42.0"  "07:08:00.0"
[2]   '2014-07-22'  49             NULL       0       "07:08:52.0"  "07:44:04.0"
[3]   '2014-07-22'  49             true       58994   "07:49:41.0"  "14:06:13.0"
[4]   '2014-07-22'  43             NULL       12397   "14:06:51.0"  "16:12:03"
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
ViniciusCR
  • 83
  • 1
  • 7
  • 1
    Please clean up the question. It's unclear in multiple areas. How do you aggregate `area`? You write `At the end I must have 4 lines`, just to display *3 lines*, etc. Please clarify. And add your version of Postgres as well as what you have tried (even if it's not working). Ideally also your table definition or a test case. – Erwin Brandstetter Aug 25 '14 at 22:11
  • Thanks Erwin, I've add some more information to clarify my problem. – ViniciusCR Aug 26 '14 at 11:48
  • I updated my answer accordingly. – Erwin Brandstetter Aug 26 '14 at 12:54

1 Answers1

0

An educated guess:

SELECT work_date, id_fertilizer, calibrado
     , sum(area)       AS area
     , min(begin_date) AS begin_date
     , max(end_date)   AS end_date
FROM  (
   SELECT *
     , count(step) OVER (ORDER BY work_date, id_fertilizer, begin_date) AS grp
   FROM  (
      SELECT *
           , lag(calibrado) OVER (ORDER BY work_date, id_fertilizer, begin_date)
             IS DISTINCT FROM calibrado AS step
      FROM   tbl
      ) sub1
   ) sub2
GROUP  BY work_date, id_fertilizer, calibrado, grp
ORDER  BY work_date, id_fertilizer, calibrado, grp;

This produces your updated result.

Related answers with more explanation:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Hi Erwin, I've tried this solution but it does not work. I can't group by begin_date. Also, if I remove the begin_date it does not work how I was expecting. I could not understand the logic of this selec. – ViniciusCR Aug 26 '14 at 14:14
  • @ViníciusCR: My solution does not group by `begin_date`. Read the answers I linked to for explanation on the used techniques. – Erwin Brandstetter Aug 26 '14 at 15:11