1

I am looking for the correct query for my mysql db that has 2 seperate tables for lengths and weights. I want to have the result returned as 1 query with 3 columns: datetime, length and weight. The query should also allow to specify the user. Eg.:

Table heights:

id  user_id  created_on           height 
 1  2        2019-01-01 00:00:01  180  
 2  2        2019-01-02 00:00:01  181  
 3  3        2019-01-03 00:00:01  182  
 4  3        2019-01-04 00:00:01  183  
 5  2        2019-01-07 00:00:01  184  

Table weights:

 id  user_id  created_on           weight
 1   2        2019-01-01 00:00:01  80   
 2   2        2019-01-04 00:00:01  81   
 3   3        2019-01-05 00:00:01  82   
 4   3        2019-01-06 00:00:01  83   
 5   2        2019-01-07 00:00:01  84  

I am looking to get the following result with a single query:

user_id  created_on             weight  height 
2        2019-01-01 00:00:01    80      180  
2        2019-01-02 00:00:01    null    181  
2        2019-01-04 00:00:01    81      null  
2        2019-01-07 00:00:01    84      184  

I have tried working with JOIN statements but fail to get the required result. This join statement

SELECT w.* , h.* FROM weight w 
  JOIN height h 
    ON w.created_on=h.created_on
    AND w.user_id=h.user_id AND user_id=2

will return only those results that have both a height and weight item for user_id and created_on A full outer join would do the trick, however this is not supported by mysql. The following query seems to be returning the required result, however it is very slow:

SELECT r.* FROM 
    (SELECT w.user_id as w_user, w.created_on as weightdate, w.value as weight, h.created_on as heightdate ,h.user_id as h_user, h.value as height FROM weight w
    LEFT JOIN height h ON w.user_id = h.user_id 
    AND w.created_on=h.created_on 
    UNION 
    SELECT w.user_id as w_user, w.created_on as weightdate, w.value as weight, h.created_on as heightdate ,h.user_id as h_user, h.value as height FROM weight w
    RIGHT JOIN height h ON w.user_id = h.user_id 
    AND w.created_on=h.created_on ) r 
WHERE h_user=2 OR w_user =2

The query takes more than 3 seconds if the 2 tables have around 3000 entries. Is there a way to speed this up, possibly using a different approach?

For extra bonus points: is it possible to allow for a small time discrepancy between both created_on datetimes? (eg. 10 minutes or within the same hour). Eg. if column weight has an entry for 2019-01-01 00:00:00 and table height has an entry for height at 2019-01-01 00:04:00 they appear in the same row.

bolvo
  • 361
  • 3
  • 13

2 Answers2

1

Instead of using a calendar table to select dates of interest, you can use a UNION to select all the distinct dates from the heights and weights tables. To deal with matching times within an hour of each other, you can compare the times using TIMESTAMPDIFF and truncate the created_on time to the hour. Since this might create duplicate entries, we add the DISTINCT qualifier to the query:

SELECT DISTINCT COALESCE(h.user_id, w.user_id) AS user_id,
       DATE_FORMAT(COALESCE(h.created_on, w.created_on), '%y-%m-%d %H:00:00') AS created_on,
       w.weight,
       h.height
FROM (SELECT created_on FROM heights
      UNION
      SELECT created_on FROM weights) d
LEFT JOIN heights h ON ABS(TIMESTAMPDIFF(HOUR, h.created_on, d.created_on)) = 0 AND h.user_id = 2
LEFT JOIN weights w ON ABS(TIMESTAMPDIFF(HOUR, w.created_on, d.created_on)) = 0 AND w.user_id = 2
WHERE h.user_id IS NOT NULL OR w.user_id IS NOT NULL
ORDER BY created_on

Output (from my demo, where I've modified your times to allow for matching within the hour):

user_id     created_on          weight  height
2           19-01-01 01:00:00   80      180
2           19-01-02 00:00:00           181
2           19-01-04 04:00:00   81  
2           19-01-07 06:00:00   84      184

Demo on dbfiddle

Nick
  • 138,499
  • 22
  • 57
  • 95
  • Thank you for your help Nick, bonus points for the time issue. Although faster, this query is still quite slow (2.2 sec vs 3 in my solution). Good improvement nonetheless. Edit: to be clear: the (INNER) JOIN query in my original post is returned in milliseconds – bolvo Jan 15 '19 at 13:28
  • @bolvo the main thing slowing this down would be generating the list of dates and dealing with the time issue. Using a pre-built calendar table (as Tim described in his answer and comments) would improve that issue. Simplifying the time matching would also help - updating the table so that all times were rounded to the nearest 10 minutes/hour (and also performing that rounding on insert) would allow a direct compare to be made which would be faster. – Nick Jan 15 '19 at 21:52
0

This is probably best handled using a calendar table, containing all dates of interest for the query. We can start the query with the calendar table, then left join to the heights and weights tables:

SELECT
    COALESCE(h.user_id, w.user_id) AS user_id,
    d.dt AS created_on,
    w.weight,
    h.height
FROM
(
    SELECT '2019-01-01 00:00:01' AS dt UNION ALL
    SELECT '2019-01-02 00:00:01' UNION ALL
    SELECT '2019-01-03 00:00:01' UNION ALL
    SELECT '2019-01-04 00:00:01' UNION ALL
    SELECT '2019-01-05 00:00:01' UNION ALL
    SELECT '2019-01-06 00:00:01' UNION ALL
    SELECT '2019-01-07 00:00:01'
) d
LEFT JOIN heights h
    ON d.dt = h.created_on AND h.user_id = 2
LEFT JOIN weights w
    ON d.dt = w.created_on AND w.user_id = 2
WHERE
    h.user_id IS NOT NULL OR w.user_id IS NOT NULL
ORDER BY
    d.dt;

enter image description here

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • hi, thanks for your solution. However I am not clear on the calendar table: I do not know in advance which entries will be available. – bolvo Jan 15 '19 at 13:30
  • @bolvo You should really maintain all the dates which cover your data somewhere. Otherwise, you would have to resort to trickery to generate the dates to be included. This would probably not perform very well. – Tim Biegeleisen Jan 15 '19 at 13:39
  • Hi Tim, I'm sure I don't understand, but users enter the data and can request the data for a certain time frame (which can cover years). The dates are stored in the db table. I am not sure whay you mean by "maintain all the dates" – bolvo Jan 15 '19 at 13:54
  • Then just add a condition to the `WHERE` clause of my query, something like `WHERE d.dt BETWEEN start AND end`. Is there some reason why that would not work for you? – Tim Biegeleisen Jan 15 '19 at 13:58
  • It is the FROM part that does not make sense to me: if a user requests all the items since 2010 I can hardly start writing out all the individual dates and times between today and 2010-01-01 00:00:00? – bolvo Jan 15 '19 at 14:04
  • OK...the subquery I have aliased as `d` is really supposed to be a separate _table_, a calendar table. [See this helpful SO question](https://stackoverflow.com/questions/10132024/how-to-populate-a-table-with-a-range-of-dates) for more information on how to do this. Once you generate the table, it easy to plugin to the query. – Tim Biegeleisen Jan 15 '19 at 14:08