0

I cannot come up with a valid SQL Query for my problem, which I will describe in the scenario of a Blog. I have three tables: User, Blog – for blog entries, and UserStatus which holds the users' status he is assigned.

The tables look like that:

User
ID | Name

Blog
ID | User_ID | Date | Text

UserStatus
ID | User_ID | Valid_From_Date | Status

I guess you can imagine what entries of User and Blog would look like. Here is how UserStatus could look for one user:

ID | User_ID | Valid_From_Date | Status
34 |  7      |  2012-01-01     | Basic
35 |  7      |  2013-04-01     | Premium
36 |  7      |  2014-08-01     | Gold

The user's valid status at a certain date is the most recent one which satisfies Valid_From_Date<=Date. So on '2014-03-30' the valid status of this user is 'Premium'.

Now, what I want is to get all blog entries together with the users' names and valid status.

I have this approach:

SELECT User.Name, UserStatus.Status, Blog.Date, Blog.Text
FROM Blog
JOIN User ON User.ID = Blog.User_ID
JOIN UserStatus ON User.ID = UserStatus.User_ID
JOIN (Select User_ID, max(Valid_From_Date) AS date_for_most_recent_status FROM UserStatus
     WHERE date_for_most_recent_status <= ??? GROUP BY User_ID) AS recent_user_status
ON recent_user_status.User_ID = UserStatus.User_ID
AND date_for_most_recent_status = UserStatus.Valid_From_Date

??? -> Can I relate to the particular Blog.Date of current entry when joining?

And that approach:

SELECT User.Name, UserStatus.Status, Blog.Date, Blog.Text, max(Valid_From)
FROM Blog
JOIN User ON User.ID = Blog.User_ID
JOIN UserStatus ON User.ID = UserStatus.User_ID
WHERE UserStatus.Valid_From_Date <= Blog.Date 
GROUP BY Blog.Date, User.Name, Blog.Text

Here the good thing is that I can relate to the actual Blog.Date since it is just on Select-Statement. However, I don't know how to handle UserStatus.Status, which should be in the GROUP BY expression but cannot be, since I just want the most recent one.

Can anyone help me out here, please?

PravinS
  • 2,640
  • 3
  • 21
  • 25
Peter F
  • 3,633
  • 3
  • 33
  • 45

2 Answers2

1

Correlated subquery taking the value of a column in a table and mapping it to a value in a sub table. In this case we know we want the max valid_from_Date for each user so we use the userID from an table outside the subequery and the userID on the table inside the subquery and return just the max and using that as the criteria to determine which user status record to limit by on the join.

SELECT User.Name, UserStatus.Status, Blog.Date, Blog.Text
FROM Blog
JOIN User 
  ON User.ID = Blog.User_ID
JOIN UserStatus 
  ON User.ID = UserStatus.User_ID
  and Valid_from_date = (Select max(Valid_From_Date)
      FROM UserStatus
      where user_ID = User.ID
      and UserStatus.Valid_from_Date <= Blog.Date)
Peter F
  • 3,633
  • 3
  • 33
  • 45
xQbert
  • 34,733
  • 2
  • 41
  • 62
  • This answer is similar to my first approach and - like that - lacks the check that the valid_from must be less or equal the date of the blog. So this is not a helpful answer to my my question. – Peter F Apr 02 '14 at 14:22
  • Sorry I took `since I just want the most recent one` to mean you only wanted the most current status, otherwise I would have added to the where clause to look at user_Status.valid_from_Date <= Blog.date. – xQbert Apr 02 '14 at 20:11
  • Thanks for your correction! I accepted your answer since this solution conforms with standard SQL and does not rely on unspecified, MySQL specific implementation regarding the "Group by" statement. I learned here that subqueries in "Join-On" conditions can relate to columns of the outer table. In my own approaches I tried to refer to other table columns in queries at the "Join level", which is not possible since that is at the outer table level too. Is that correct? I'd be glad if you elaborated. – Peter F Apr 03 '14 at 08:42
  • 1
    Yes your understanding is correct. The subquery on the join is able to reference row level values at the time of join. This unfortunately means that a query has to be run for every record in the result set which can negatively effect performance. But it is a common method used to resolve this situation. There are other methods to accomplish this as well: Common Table Expressions or analytic functions such as over partition by or over order by but again I don't believe these are supported in mySQL thus the subquery which is more in line with cross database standards. – xQbert Apr 03 '14 at 16:47
1

If you want data user wise then use below:

SELECT User.Name, a.Status, Blog.Date, Blog.Text, a.valid_from_date
FROM Blog
JOIN USER ON User.ID = Blog.User_ID
JOIN
(SELECT user_id,`status`,valid_from_date FROM userstatus ORDER BY valid_from_date DESC) a
ON a.user_id=User.ID
WHERE a.Valid_From_Date <= Blog.Date 
GROUP BY a.user_id;

If you want blog date, user, text wise then use:

SELECT User.Name, a.Status, Blog.Date, Blog.Text, a.valid_from_date
FROM Blog
JOIN USER ON User.ID = Blog.User_ID
JOIN
(SELECT user_id,`status`,valid_from_date FROM userstatus ORDER BY valid_from_date DESC) a
ON a.user_id=User.ID
WHERE a.Valid_From_Date <= Blog.Date 
GROUP BY Blog.Date, a.user_id, Blog.Text;
Zafar Malik
  • 6,734
  • 2
  • 19
  • 30
  • Thank you, the second version retrieves exactly what I wanted. However, I am concerned if selecting a (non-aggregate) column and not referring to it in the 'GROUP BY' statement works stable. It seems that MySQL always returns the first entry, but this is not specified by standard SQL. See here for further discussion: [link](http://stackoverflow.com/questions/1023347/mysql-selecting-a-column-not-in-group-by) – Peter F Apr 02 '14 at 14:37
  • As per my experience with mysql by group by we always get first record, so there should not be any issue. – Zafar Malik Apr 02 '14 at 14:42