0

I've got a database full of weather data...specifically the date, max temp, min temp, and daily rainfall for more than 100 years. I'm trying to find the maximum temperature for each day and the specific date that it occurred over the entire 100+ years.

My table is set up similar to below...

+-------+------------+------+------+------+
| id    | date       | thi  | tlo  | rain |
+-------+------------+------+------+------+
| 42856 | 2016-01-01 |   49 |   39 | 0.00 |
| 42857 | 2016-01-02 |   51 |   38 | 0.00 |
| 42858 | 2016-01-03 |   60 |   37 | 0.00 |
| 42859 | 2016-01-04 |   54 |   32 | 0.00 |
| 42860 | 2016-01-05 |   47 |   32 | 0.00 |
+-------+------------+------+------+------+
5 rows in set (0.01 sec)

I want to find the max(thi) for each day of the year and the date in which it occurred. This data goes back to 1899 so there are 117 January's in the database and so on for each year.

I have come up with the following so far...

select date, max(thi),
-> DAY(date)
-> from dfw where MONTH(date)='01'
-> group by DAY(date);
+------------+----------+-----------+
| date       | max(thi) | DAY(date) |
+------------+----------+-----------+
| 1899-01-01 |       83 |         1 |
| 1899-01-02 |       78 |         2 |
| 1899-01-03 |       84 |         3 |
| 1899-01-04 |       81 |         4 |
| 1899-01-05 |       82 |         5 |
| 1899-01-06 |       79 |         6 |
| 1899-01-07 |       83 |         7 |
| 1899-01-08 |       88 |         8 |
| 1899-01-09 |       82 |         9 |
| 1899-01-10 |       79 |        10 |
| 1899-01-11 |       83 |        11 |
| 1899-01-12 |       82 |        12 |
| 1899-01-13 |       78 |        13 |
| 1899-01-14 |       79 |        14 |
| 1899-01-15 |       80 |        15 |
| 1899-01-16 |       81 |        16 |
| 1899-01-17 |       79 |        17 |
| 1899-01-18 |       80 |        18 |
| 1899-01-19 |       84 |        19 |
| 1899-01-20 |       83 |        20 |
| 1899-01-21 |       79 |        21 |
| 1899-01-22 |       85 |        22 |
| 1899-01-23 |       88 |        23 |
| 1899-01-24 |       82 |        24 |
| 1899-01-25 |       84 |        25 |
| 1899-01-26 |       82 |        26 |
| 1899-01-27 |       81 |        27 |
| 1899-01-28 |       85 |        28 |
| 1899-01-29 |       84 |        29 |
| 1899-01-30 |       86 |        30 |
| 1899-01-31 |       93 |        31 |
+------------+----------+-----------+
31 rows in set (0.01 sec)

This gives me the maximum for each day in January which is good...but I need the date on which it occurred. For some reason all I am getting is 1899.

For example on January 31...the max(thi) is 93 but it occurred on 1911-01-31. There are also times in which the max(thi) could have occurred in multiple years. On January 30...the max(thi) is 86 which occurred on 1906-01-30 and 1994-01-30.

Is there a way to do this in MySQL or am I just out of luck? Thanks in advance!

Jason Dunn
  • 77
  • 1
  • 8

1 Answers1

2

The value returned for date expression in your SELECT is indeterminate. MySQL is free to return a date value from any row in the group. (Other databases would throw an error with this query. A MySQL specific extension to GROUP BY allows the query to run, but we can get MySQL to more closely conform to the SQL standard, and throw an error with this query, by including ONLY_FULL_GROUP_BY in sql_mode.)

You've got a good start.

          SELECT DATE_FORMAT(n.date,'%m%d') AS mmdd 
               , MAX(n.thi) AS max_thi
            FROM dfw
           GROUP BY DATE_FORMAT(n.date,'%m%d')   

To get the year, there's a couple of approaches. One is to use the query as an inline view, and join to the original table to find a matching row, one with the same month and day, and the same thi value.

You can use either the MAX() or MIN() aggregate to get the latest or earliest date.

 SELECT m.mmdd
      , m.thi
      , MAX(t.date)  AS latest_date
      , MIN(t.date)  AS earliest_date
   FROM ( 
          SELECT DATE_FORMAT(n.date,'%m%d') AS mmdd 
               , MAX(n.thi) AS thi
            FROM dfw
           GROUP BY DATE_FORMAT(n.date,'%m%d')   
        ) m
  JOIN dfw t
    ON t.thi                      = m.thi
   AND DATE_FORMAT(t.date,'%m%d') = m.mmdd  
 GROUP BY m.mmdd
 ORDER BY m.mmdd

If you want to return all years for a given mmdd that the highest thi occurred, remove the GROUP BY clause, and the aggregate from around t.date

 SELECT m.mmdd
      , m.thi
      , t.date
   FROM ( 
          SELECT DATE_FORMAT(n.date,'%m%d') AS mmdd 
               , MAX(n.thi) AS thi
            FROM dfw
           GROUP BY DATE_FORMAT(n.date,'%m%d')   
        ) m
  JOIN dfw t
    ON t.thi                      = m.thi
   AND DATE_FORMAT(t.date,'%m%d') = m.mmdd  
 ORDER BY m.mmdd, t.date

As another alternative, to get the earliest date that thi occurred, you could use a correlated subquery in the SELECT list:

 SELECT DATE_FORMAT(n.date,'%m%d') AS mmdd 
      , MAX(n.thi) AS thi
      , ( SELECT t.date
            FROM dfw t
           WHERE DATE_FORMAT(t.date,'%m%d') = DATE_FORMAT(n.date,'%m%d')
             AND t.thi = n.thi
           ORDER BY t.date 
           LIMIT 0,1
        ) AS earliest_date 
   FROM dfw n
  GROUP BY DATE_FORMAT(n.date,'%m%d')   
  ORDER BY DATE_FORMAT(n.date,'%m%d')   
spencer7593
  • 106,611
  • 15
  • 112
  • 140