-2

Why is this sub query bringing back NULL values when the table learner_employment contains no NULL values in the EMPLOYMENT_STATUS field? If I run it as as standalone query on any person code I get values. It seems to be an issue with the 'top 1' command, as there are values if I remove this. However I require only the earliest employment record.

select 
p.PERSON_CODE,
EMPLOYMENT_STATUS,
EMPLOYMENT_INTENSITY,
DATE_STATUS_APPLIES
from people p
left join 
(select top 1 PERSON_CODE, 
EMPLOYMENT_STATUS, 
EMPLOYMENT_INTENSITY, 
DATE_STATUS_APPLIES 
from learner_employment
order by DATE_STATUS_APPLIES) emp 
on emp.PERSON_CODE = p.PERSON_CODE
Amira Bedhiafi
  • 8,088
  • 6
  • 24
  • 60
Joe G
  • 1
  • 1
  • Because the value of `PERSON_CODE` of the `TOP (1)` row in the table `learner_employment` doesn't match any of the rows in your table `people`. When no related row is found `NULL` is displayed in the column's value. – Thom A Jul 11 '23 at 13:24
  • 1
    Are you, perhaps, after a *correlated* subquery? You'll need to use `APPLY` for that, not a `JOIN`. – Thom A Jul 11 '23 at 13:25
  • Hi @Thom, thanks for the reply. The subquery is joined to the **people** table on the value of **PERSON_CODE**, and I know that all the learners in the **people** table have at least one record in **learner_employment**. Therefore I don't understand why the **Top 1** command returns NULL values? – Joe G Jul 11 '23 at 14:39
  • Yes, the sub query is `JOIN`ed and that subquery only returns *one row*, for a single person. So all your other people don't have a related row from that subquery. – Thom A Jul 11 '23 at 14:48
  • If you run your subquery, `select top 1 PERSON_CODE, EMPLOYMENT_STATUS, EMPLOYMENT_INTENSITY, DATE_STATUS_APPLIES from learner_employment order by DATE_STATUS_APPLIES`, you'll see what that one row returned is. If you supply sample data and expected result, this would be *much* easier to explain. – Thom A Jul 11 '23 at 14:51
  • Thanks @Thom A, but every **person_code (learner)** has at least one record in the **learner_employment** table. If I run the subquery without the **Top 1** command, every learner will have at least one record with an **EMPLOYMENT_STATUS**. This confirms that the join to the **People** table does work. However some learners have multiple employment records, hence the need for the **Top 1**, but this appears to not work within the subquery. Any further advise much appreciated. – Joe G Jul 11 '23 at 15:38
  • But you *aren't* running the query without a `TOP (1)`. YOu are asking for the `TOP 1` row in the table `learner_employment` and then `LEFT JOIN`ing that to the table `people`. Again, I *assume* you want a correlated query but you never answered that comment nor do we have any sample data or expected results. The query is working as expected and as written; the problem is we don't know what *you* expect it to do as we have no sample data or expected results. – Thom A Jul 11 '23 at 15:41
  • Sorry I didn't respond to the correlated query comment - which I'm not familiar with - but I first wanted to address your other comment as it didn't make sense to me. Forgive me this is my very first posted question on stackoverflow. I thought my query and explanation provided a fairly clear example, but I will provide more specifics in next comment. Perhaps you'll be then be able to kindly advise on the best way to get the expected result. Thanks again – Joe G Jul 11 '23 at 15:49
  • The problem is that `top 1` is applied (logically) before it gets to the join. So in the end you're getting lookup data for only one person while everybody else will subsequently fail to match. – shawnt00 Jul 11 '23 at 16:28
  • Thanks @shawnt00. However I'm not sure this is correct, as all records return NULL values. See my additional detail below with more specific examples for one learner. – Joe G Jul 11 '23 at 16:38
  • And this very same explanation is precisely why you get a value when you've narrowed the subquery to return the row based on an id. That filtering is happening *prior* to a top 1 limitation. You either haven't scrolled through all the data or you're doing some kind of other filtering later that wasn't included in the question. try adding `order by p.person_code desc` at the end. – shawnt00 Jul 11 '23 at 16:44
  • I'm doing neither. I'm pretty comfortable with SQL code and have added many subqueries before. Having tested queries in some other tables just now, I'm sure the issue is specific to the **top** function in subqueries. I will have a further play in the morning, but if you have any further suggestions in the meantime they will be hugely welcome. Thanks – Joe G Jul 11 '23 at 16:59
  • You're highly confident that the subquery/top 1 is not resulting in null values--I buy that. The origin of the null values is the outer join, not the subquery itself, that fails to match any rows from the derived table that only contains a single row by its definition. The easiest way to prove is probably just to change to an inner join after which you'll see the singular result. – shawnt00 Jul 11 '23 at 17:08
  • The inner join returned no rows at all. Thankfully the correlated 'Outer Apply' query which @Thom A suggested worked (after some trial and error!). Thanks for your time anyway. – Joe G Jul 11 '23 at 17:41
  • Yes, the correlated query/cross apply is exactly the solution to the same problem we've been trying to describe to you this whole time. If you really want to learn from this then go back and re-read every comment and understand what was being explained. – shawnt00 Jul 11 '23 at 19:23
  • I spent most of the afternoon considering the comments and working through suggestions, eventually landing on the correlated query after trying it initially with the MRE provided by the other contributor. Your comment feels unnecessarily negative and somewhat condescending. – Joe G Jul 12 '23 at 10:18
  • You were quite resistant to all the gentle nudging and attempts to tease out more information that would facilitate helping you out. Nothing else I can do but move on when there's little incentive. Yeah, I was a bit exasperated. – shawnt00 Jul 12 '23 at 22:44

1 Answers1

0

I suspect the problem is your misunderstanding of how the JOIN is working. I'm going to provide a MRE here, as your question lacks one. Firstly some sample data:

CREATE TABLE dbo.SomeTable (SomeID int,
                            SomeValue varchar(10));
GO
CREATE TABLE dbo.AnotherTable (AnotherID int,
                               SomeID int,
                               AnotherDate date);
GO

INSERT INTO dbo.SomeTable (SomeID,
                           SomeValue)
VALUES(1,'abc'),
      (2,'def'),
      (3,'ghi'),
      (4,'xyz');
GO
INSERT INTO dbo.AnotherTable (AnotherID,
                              SomeID,
                              AnotherDate)
VALUES (1, 1, GETDATE()),
       (2, 1, GETDATE() + 1),
       (3, 3, GETDATE() + 4),
       (4, 3, GETDATE() + 2),
       (5, 3, GETDATE() - 1),
       (6, 4, GETDATE() + 3);
GO

Now lets write a query that represents what your does:

SELECT S.SomeID,
       S.SomeValue,
       A.AnotherDate
FROM dbo.SomeTable S
     LEFT JOIN (SELECT TOP (1)
                       sq.SomeID,
                       sq.AnotherDate
                FROM dbo.AnotherTable sq
                ORDER BY sq.AnotherDAte DESC) A ON S.SomeID = A.SomeID;

This returns the following dataset:

SomeID SomeValue AnotherDate
1 abc NULL
2 def NULL
3 ghi 2023-07-15
4 xyz NULL

This is expected, the subquery returns one row and then that is LEFT JOINed onto. We can see what the subquery would return with the following:

SELECT TOP (1)
       sq.SomeID,
       sq.AnotherDate
FROM dbo.AnotherTable sq
ORDER BY sq.AnotherDate DESC;

Which, unsurprisingly, returns the following:

SomeID AnotherDate
3 2023-07-15

This is because SomeID 3 has the row with the highest value of AnotherDate and so in the priordata set only SomeID 3 has a value in AnotherDate (1, 2, and 4 aren't equal to 3 so the LEFT JOIN doesn't return a row).


Perhaps what you want is instead of a JOIN is a correlated query. You need to use APPLY for this. As you have a LEFT JOIN, then presumably you need an OUTER APPLY. This would then return the TOP (1) row for each correlation:

SELECT S.SomeID,
       S.SomeValue,
       A.AnotherDate
FROM dbo.SomeTable S
    OUTER APPLY (SELECT TOP (1)
                      sq.AnotherDate
               FROM dbo.AnotherTable sq
               WHERE sq.SomeID = S.SomeID
               ORDER BY sq.AnotherDate DESC) A;

Which returns the following:

SomeID SomeValue AnotherDate
1 abc 2023-07-12
2 def NULL
3 ghi 2023-07-15
4 xyz 2023-07-14

Though if this is your goal,you should just use a MAX (not sure that's the case here mind):

SELECT S.SomeID,
       S.SomeValue,
       MAX(A.AnotherDate) AS AnotherDate
FROM dbo.SomeTable S
    LEFT JOIN dbo.AnotherTable A ON S.SomeID = A.SomeID
GROUP BY S.SomeID,
         S.SomeValue;

Alternatively, you can use the Get top 1 row of each group solution.

Thom A
  • 88,727
  • 11
  • 45
  • 75
  • Thom I managed to spend some further time after work trying out the correlated query. I had previously applied this to my script, but it brought back the same value for every learner. However, once I followed the exact format in your examples I'm happy to report it successfully retuned one record from **learner_employment** per learner. Thank you so much for taking the time to offer such considered and detailed explanation. Really appreciated, I'd hit a brick wall with it. – Joe G Jul 11 '23 at 17:34