0

There are many similar questions which I've learned from, but my result set isn't returning the expected results.

My Objective:

Build a query that will return a result set containing all rows in table demo1 with user_id = "admin", and the only row of table demo2 with user_id = "admin". Each row in demo2 has a unique user_id so there's always only one row with "admin" as user_id.

However, I don't want demo2 data to wastefully repeat on every subsequent row of demo1. I only want the first row of the result set to contain demo2 data as non-null values. Null values for demo2 columns should only be returned for rows 2+ in the result set.

Current Status:

Right now my query is returning the appropriate columns (all demo1 and all demo2) but all the data returned from demo2 is null.

Demo1:

id  user_id  product  quantity  warehouse
1   admin    phone    3         A
2   admin    desk     1         D 
3   k45      chair    5         B

Demo2:

id  user_id  employee  job   country
1   admin    james     tech  usa
2   c39      cindy     tech  spain

Query:

SELECT * 
from  demo1
left join  (SELECT * FROM demo2 WHERE demo2.user_id = 'X' LIMIT 1) X
on (demo1.user_id = x.user_id)
WHERE demo1.user_id = 'admin'

Rationale:

The subquery's LIMIT 1 was my attempt to retrieve demo2 values for row 1 only, thinking the rest would be null. Instead, all values are null.

Current Result:

id  user_id  product quantity warehouse id    employee  job   country
1   admin    phone   3        A         null  null      null  null
2   admin    desk    1        D         null  null      null  null

Desired Result:

id  user_id  product quantity warehouse id    employee  job   country
1   admin    phone   3        A         1     james     tech  usa
2   admin    desk    1        D         null  null      null  null

I've tried substituting left join for left inner join, right join, full join, but nothing returns the desired result.

Mathomatic
  • 899
  • 1
  • 13
  • 38
  • What is the point of the subquery? Why `'X'`? Have you looked a the subquery result? PS Please in code questions give a [mcve]--cut & paste & runnable code & desired output & clear specification & explanation. That includes the least code you can give that is code that you show is OK extended by code that you show is not OK. (Debugging fundamental.) PS Read about fundamentals of query optimization/implementation in general & for your DBMS. Don't hypothesize about implementation/performance. – philipxy May 01 '19 at 23:12
  • I read an answer which I can't find now, which had an 'x' so I copied it, and it worked. I'm learning Postgres currently. JNevill said this is the best approach so I tend to trust his judgement although I surely will benchmark it once I go further. I'm not hypothesizing about the performance, although I did ask for his opinion. I'm confused why an answer like the following gets 21 upvotes yet shows no code, yet mine at least included a working query. It's hard to gauge which others' posts are suitable as templates for successful reception: https://stackoverflow.com/q/9536262/4594511 – Mathomatic May 01 '19 at 23:40
  • When you say: `That includes the least code you can give that is code that you show is OK extended by code that you show is not OK`. I don't understand how I can show code that is OK if that's precisely the code I'm seeking in my question. Of course I can provide code that's not OK. I merely left out the table creation queries, if that's what you're mainly referring to. – Mathomatic May 01 '19 at 23:41
  • By "OK" I mean code that is correct for whatever it is supposed to do--not necessarily that it is doing what your goal is. Show code & its specification that is correct & then code & its specificaiton that extends that but is incorrect. Your current presentation is jumbled. PS Re "I merely left out the table creation" (which are not queries, they are delarations) the link & my summary of it explicitly say to include code enough for us to cut & paste & run an example including example input table initialization. Re "performance" you explicitly say "I don't want demo2 data to wastefully repeat". – philipxy May 07 '19 at 07:07
  • Learn what LEFT JOIN ON returns: INNER JOIN ON rows UNION ALL unmatched left table rows extended by NULLs. Always know what INNER JOIN you want as part of an OUTER JOIN. – philipxy May 07 '19 at 07:40
  • Re that highly-upvoted question: It is 7 years old; it is conceptually simple; it is a thing many beginners want to do; it has a clear & easily googled title; it may be linked as a duplicate for many questions since; it should be close voted as a duplicate from 2 years before, but beginners who find it don't know that; it is clear; it shows research; etc. Just focus on making your post a clear presentation following the help links & comments. Eg: "My Objective" 1st paragrapgh is OK but the 2nd unclearly says you want something else. Your title is unintelligible & has no connection to it. Etc. – philipxy May 07 '19 at 08:25

2 Answers2

1

Your join is going to bring through ANY records that satisfies the join condition for your two tables. There is no changing that.

But you could suppress subsequent records in your result set from displaying the matching demo2 record that satisfied the join condition AFTER it's joined:

SELECT demo1.id ,
    demo1.user_id,
    demo1.product,
    demo1.quantity,
    demo1.warehouse 
    CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.id END as demo2_id,    
    CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.employee END AS demo2_employee,  
    CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.job END as demo2_job,
    CASE WHEN ROW_NUMBER() OVER (PARTITION BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.country END as demo2_country
from  demo1
left join  demo2 
    on demo1.user_id = demo2.user_id
        AND demo2.user_id = 'X'
WHERE demo1.user_id = 'admin'

That's just a quick rewrite of your original sql with the addition CASE expressions included.

That being said, this sql will produce no results for demo2 since the demo2.user_id can't satisfy both conditions in this query:

  1. The join condition demo1.user_id = demo2.user_id with the where predicate of demo1.user_id = 'admin'

  2. Also hold the value X.

It's either admin and satisfies your first join condition, but fails your second. Or it's X and satisfies your second condition, but nor your first.

JNevill
  • 46,980
  • 4
  • 38
  • 63
  • Thanks but the `CASE` statements have a bracket missing so I can't run the query. I'm trying to figure it out. Please edit if you're here – Mathomatic Apr 30 '19 at 17:45
  • Oh my. Yea I really screwed that up. Just a sec. – JNevill Apr 30 '19 at 17:46
  • OK. Fixed that typo. So basically that is creating a "Window" of the result sets records where each distinct `demo1.user_id` defined a distinct window. Then in that window we order by the `demo1.id` and apply a `Row_Number()`. If it's row `1` in that window then we allow the `Demo2` results through in that record. – JNevill Apr 30 '19 at 17:47
  • Getting this error: `ERROR: syntax error at or near "," LINE 6: ...N BY demo1.user_id ORDER BY demo1.id) = 1 THEN demo2.id, ` the last comma is causing it. Strange. I added the comma before the first `CASE` already. Hmm. – Mathomatic Apr 30 '19 at 17:49
  • I can't get your query to work to test whether your answer is suitable to be accepted. – Mathomatic Apr 30 '19 at 19:16
  • Yeah. I'm just a bundle of screw ups today. Just another moment and I'll have that fixed. Hopefully for good. – JNevill Apr 30 '19 at 19:26
  • No worries, whatsoever. – Mathomatic Apr 30 '19 at 19:27
  • `ERROR: missing FROM-clause entry for table "x" LINE 12: on demo1.user_id = X.user_id` – Mathomatic Apr 30 '19 at 19:32
  • HAHA! HOT GARBAGE! Fixed. Stole your old Derived Table alias and didn't fix that in the rewrite. – JNevill Apr 30 '19 at 19:33
  • This produces all null values. :( See here: https://imgur.com/a/RD3WFgc – Mathomatic Apr 30 '19 at 19:34
  • 1
    Perfect. So that's what I was talking about in the paragraph or two below. For fun, take out that `AND demo2.user_id = 'X'` and rerun. – JNevill Apr 30 '19 at 19:35
  • Bam. Worked! Thanks a lot. I don't really understand why yet but I'll certainly spend the evening doing my best. Thanks a lot. One follow up question: Is this the most efficient approach or is it still somewhat wasteful of resources/bandwidth? +1 – Mathomatic Apr 30 '19 at 19:37
  • Not knowing your data well, I can't say this is the most efficient. BUT this is going to be pretty quick. You are joining the tables on `user_id` so if you want to squeeze performance out of this then insure you are indexing both tables on those columns. The window functions add overhead after the join is performed as that's one of the last checks. So if you need it even more performant, then just do a `SELECT *`. Of course that doesn't solve your initial ask of suppressing subsequent records per `demo1.id`. So... this is a pretty good way to go. – JNevill Apr 30 '19 at 19:40
  • It certainly can't be worse than 2 separate queries – Mathomatic Apr 30 '19 at 19:44
  • 1
    The performance impacts here are: 1) your where predicate on `user_id`. If that field is indexed, then the performance will be improved. 2) Joining the two tables. Obviously this costs. Indexes will help, but a join is a join and it's like the whole point of a database anyway. 3) The window functions. No index will save you here so your only option is to just deal with every row having data in the `demo2` fields, so it is what it is. I'm glad we got it working though! :) – JNevill Apr 30 '19 at 19:47
1

Here is another nice approach: sqlfiddle

Pavel
  • 1,627
  • 15
  • 12
  • Thanks Pavel, but that doesn't solve my problem. Your result set returns a column `employee` with a value `desk`. Obviously that's not correct. Please see my `Desired Result` above and compare it to your answer. Are you able to update your answer? I will accept it if it works. – Mathomatic Apr 30 '19 at 19:16
  • JNevill helped me out above. I'm definitely willing to try out your implementation if you have something to share, Pavel. Thank you. – Mathomatic Apr 30 '19 at 19:39
  • It was only an issue with predefined data, I've fixed it and now it is exactly the same as your desired resultset – Pavel May 01 '19 at 05:05