2

I have a table with information and dates, which have some missing ones, so I want to join that table with a calendar table to fill missing dates and set values in another column in the same row to null. This is an example:

Steps | Date
 10   | 2018-04-30
 20   | 2018-04-28

And it want to do the following:

Steps | Date
 10   | 2018-04-30
 null | 2018-04-29
 20   | 2018-04-28

This is what I tried (real query, so you can point out if I'm doing something wrong):

SELECT sum(steps), date(from_unixtime(u.in_date)) as stepdate
    FROM userdata u
        RIGHT JOIN
    time_dimension td
    ON date(from_unixtime(u.in_date)) = td.db_date
    AND user_id = 8
    GROUP BY day(from_unixtime(in_date))
    ORDER BY stepdate DESC;

I expected this query to do what I wanted, but it doesn't. The table time_dimension and its column db_date have all dates (ranging from 2017-01-01 to 2030-01-01), which is the one I'm trying to join userdata's in_date column (which is in unix_time).

Edit: I checked the following questions in SO:

Edit, regarding the duplicate: That question in particular is using intervals and date_add to compare against their table. I am using a calendar table instead to join them. While similar, I don't think they won't have the same solution.

Solution: Thanks to xQBert, who pointed out the mistake:

PROBLEM: Having the group by be on the userdata table as well as the select, you're basically ignoring the time dimension data. There is no 2018-4-29 date in Userdata right (for user 8) Fix the select & group by to source from time dimension data and problem solved.

So, I changed GROUP BY day(from_unixtime(in_date)) to GROUP BY td.db_date.

Newbb
  • 33
  • 7
  • 1
    I would select from the calendar table first and then LEFT JOIN against your data table on the date field(s). – Ross Bush May 02 '18 at 14:31
  • THE PROBLEM: Having the group by be on the userdata table as well as the select, you're basically ignoring the time dimension data. There is no 2018-4-29 date in Userdata right (for user 8) Fix the select & group by to source from time dimension data and problem solved. Note: I doubt you need any function calls for conversion on td.db_date in select and group by as time dimension data appears to be in YYYY-MM-DD format already. – xQbert May 02 '18 at 14:47
  • The GROUP BY needs to include the date field from the calendar table since it contains all the keys, nulls will rollup for mismatches. – Ross Bush May 02 '18 at 14:50
  • @Newbb: how do you know what dates to start/end on? first and last date of ID provided? by outer joining to time_Dimension you'll get all dates in that table. you need to limit by the min/max of a user's dates? – xQbert May 02 '18 at 14:54
  • Yes! @xQbert's comment has the answer. That went well over my head. If you want to add it as an answer, I'll gladly select it as the answer. – Newbb May 02 '18 at 14:55
  • Dup question can't add answers now :( Glad we could help though! – xQbert May 02 '18 at 14:56
  • @RiggsFolly This question already had the right approach was just not sourcing fields from the correct table. – xQbert May 02 '18 at 15:04
  • @xQbert Is that a request to reopen the question? – RiggsFolly May 02 '18 at 15:27
  • Na; just a FYI. I'd request to re-open if needed. – xQbert May 02 '18 at 15:44
  • I edited my question with the solution, and what I did to fix it. I hope it helps someone in case they also have the same problem! – Newbb May 03 '18 at 08:22

1 Answers1

0

You need left join rather than right join or you may also change the position of tables

SELECT sum(steps), date(from_unixtime(td.db_date)) as stepdate
FROM time_dimension td  
LEFT JOIN userdata u
     ON date(from_unixtime(u.in_date)) = td.db_date 
WHERE user_id = 8
GROUP BY date(from_unixtime(td.db_date))
ORDER BY stepdate DESC;

However, this assumes time_dimension table treating as calender table.

Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
  • @Rossbush That's what the time_dimension table is for; it has all dates within a range. (at least that's what I get from the question) " The table time_dimension and its column db_date have all dates (ranging from 2017-01-01 to 2030-01-01)" – xQbert May 02 '18 at 14:33
  • Thanks for the fast answer! That didn't work either; I'm getting the same result. [Here's a screenshot of the result, if it helps](https://imgur.com/b0IwxHj) – Newbb May 02 '18 at 14:34
  • @Newbb.. If you have `time_dimension` table has all possible dates then you can check edited answer as above else you would need to create one calender table and do the joins accordingly. – Yogesh Sharma May 02 '18 at 14:40
  • the problem is the group by `GROUP BY day(from_unixtime(in_date))` should be `group by td.db_date`. The select seems odd too... Why do you need to use the date and from_unixtime functions on a filed that appears to already be a date? If it's not then you're missing those functions on the td.db_date field on the join. – xQbert May 02 '18 at 14:40
  • @YogeshSharma I doubt the select and group by need the `date(from_unixtime(` functions. If they do then the on clause `td.db_date` is wrong. Additionaly by moving the `user_ID = 8` to the where clause don't you eliminate the dates you just left joined to? that limit must be on the join to keep the missing dates. The problem is you end up getting all dates in the time dimension table. – xQbert May 02 '18 at 14:50
  • So to really solve the time dimenion table must be limited to the min/max dates in the userdata table so all dates in time_dimension between userID 8's first entry and last entry. – xQbert May 02 '18 at 14:54