-1

How can I return one row for every row appearing in Table1 (dps) joining columns from Table2(dsd) in case they exist?

The join can be done with x, y , id (dsd.id can be NULL on Table2). For every x,y pair can be more than one id. I need to filter by a time range with time column only appearing in Table2 (dsd).

I added s.time IS NULL in the WHERE clause because there are not rows in Table2 (dsd) for every row in Table1 (dps). But I would like to remove rows in Table2 where the dsd.time IS NULL.

SELECT
      s.time
      , dps.x
      , dps.y
      , dps.id AS metric
      , dps.anzahl AS produced
      , s.read_tags AS read_tags
    FROM
      dps
     LEFT JOIN  
        (SELECT MAX(dsd.time) AS time, dsd.x, dsd.y, dsd.id, COUNT(DISTINCT dsd.tagid) AS read_tags 
          FROM dsd
          WHERE (time IS NOT NULL)
          GROUP BY dsd.x, dsd.y, dsd.id
        ) AS s
      ON ( s.x = dps.x AND s.y = dps.y AND (s.id = dps.id OR s.id IS NULL) )           
    WHERE s.time BETWEEN valueA AND valueB
        OR s.time IS NULL
    ORDER BY  s.time 

Maybe a different type of join is better?

EDIT: now I just need to filter the results by min&max of s.x and s.y of the s query, so I only get results within the time range (for those cases where s.time is NULL). So dps.x should be between MIN(s.x) and MAX(s.x) and the same for dps.y

Cabbo
  • 31
  • 11
  • If there are multiple rows, what row do you want to return? Does this answer your question? [Get top 1 row of each group](https://stackoverflow.com/questions/6841605/get-top-1-row-of-each-group) – Thom A Mar 10 '22 at 10:57
  • All of the rows in Table1 always, plus columns from Table2. The ```x, y, id``` is unique, but it is not always present in Table2. In that case, it could be just NULL values in the result – Cabbo Mar 10 '22 at 10:59

1 Answers1

2

Your original attempt appeared very close except for the extra the LEFT JOIN condition of the ID match or NULL.

To clarify, the JOIN clause is stating how the tables are related. In your case, you were trying to match up on the 3 possible unique key parts, the X, Y and ID. Then, by having the LEFT JOIN is proper intent and context that you wanted everything from the first (left) table regardless of a match in the right (alias s of your subquery). The WHERE clause is where you just needed the NULL check, and you did that with your check on the null time. If there was no match on X, Y or ID as a set, the Time column (or any column from the s alias) would have been null. You did not need to just check on the ID column being null.

However, if there WAS a match, then you WOULD have a time and thus being checked against your A and B parameter value range as set.

SELECT
        s.time, 
        dps.x, 
        dps.y, 
        dps.id metric, 
        dps.anzahl produced, 
        s.read_tags 
    FROM
        dps
            LEFT JOIN  
            ( SELECT 
                    dsd.x, 
                    dsd.y, 
                    dsd.id, 
                    MAX(dsd.time) AS time, 
                    COUNT(DISTINCT dsd.tagid) AS read_tags 
                FROM 
                    dsd
                WHERE 
                    dsd.time IS NOT NULL
                GROUP BY 
                    dsd.x, 
                    dsd.y, 
                    dsd.id ) s
                    ON  dps.x = s.x  
                    AND dps.y = s.y 
                    AND dps.id = s.id
    WHERE
            s.time IS NULL
        OR s.time BETWEEN valueA AND valueB
    ORDER BY
        s.time

Finally, the order by. Your choice on how to optimize that. As it is now, since you order by the time, the rows could come back in almost any order except that all the null might float to the top first because they lack a value vs all others that ARE found have a value.

If you want all matches first, you might want to order by

ORDER BY
   case when s.time is null then 2 else 1 end,
   s.time

This way, it forces all NULL records to the SECOND part of the result set (via then 2), otherwise, all other values that DO have a match are listed first (via else 1).

I dont quite get what you mean about a min/max of "X" value. However, that should just be at your outer WHERE clause, such as

WHERE
       (existing criteria)
   AND dps.x >= SomeValue
   AND dps.x <= SomeOtherLimitValue

I don't know how you would be considering a MIN(s.x) or MAX(s.x).. What is the basis of this other than to limit your output of wanting to focus in on a specific range.

DRapp
  • 47,638
  • 12
  • 72
  • 142
  • Thnks for your response, I'll `ORDER BY dps.x, dps.y`. How could I remove the rows in the final result set not between `MIN(s.x) AND MAX(s.x) ` ? – Cabbo Mar 10 '22 at 13:06
  • @Cabbo, edit your original post and clarify what you mean about min/max of the X value. What is that basis. Edit your post, then reply back to me here to look and revise my answer. – DRapp Mar 10 '22 at 13:11
  • Edited the post with details @drapp – Cabbo Mar 10 '22 at 14:25
  • @Cabbo, revised answer – DRapp Mar 10 '22 at 14:49