Getting a value from a column different to the column used in a MIN
/MAX
expression in a GROUP BY
query still remains a surprisingly difficult thing to do in SQL, and while modern versions of the SQL language (and SQL Server) make it easier, they're completely non-obvious and counter-intuitive to most people as it necessarily involves more advanced topics like CTEs, derived-tables (aka inner-queries), self-joins and windowing-functions despite the conceptually simple nature of the query.
Anyway, as-ever in modern SQL, there's usually 3 or 4 different ways to accomplish the same task, with a few gotchas.
Preface:
As Site
, Date
, Year
, and Month
are all keywords in T-SQL, I've escaped them with double-quotes, which is the ISO/ANSI SQL Standards compliant way to escape reserved words.
- SQL Server supports this by default. If (for some ungodly reason) you have
SET QUOTED IDENTIFIER OFF
then change the double-quotes to square-brackets: []
I assume that the Site
column in both tables is just a plain' ol' data column, as such:
- It is not a
PRIMARY KEY
member column.
- It should not be used as a
GROUP BY
.
- It should not be used in a
JOIN
predicate.
All of the approaches below assume this database state:
CREATE TABLE "Employee" (
"Site" int NOT NULL,
WorkTypeId char(2) NOT NULL,
Emp_NO int NOT NULL,
"Date" date NOT NULL
);
CREATE TABLE "PTO" (
"Site" int NOT NULL,
WorkTypeId char(2) NULL,
Emp_NO int NOT NULL,
"Date" date NOT NULL
);
GO
INSERT INTO "Employee" ( "Site", WorkTypeId, Emp_NO, "Date" )
VALUES
( 5015, 'MB', 1005, '2022-02-01' ),
( 5015, 'MI', 1005, '2022-02-04' ),
( 5015, 'PO', 1005, '2022-02-04' ),
( 5015, 'ME', 2003, '2022-01-01' ),
( 5015, 'TT', 2003, '2022-01-10' );
INSERT INTO "PTO" ( "Site", WorkTypeId, Emp_NO, "Date" )
VALUES
( 5015, NULL, 1005, '2022-02-03' ),
( 5015, NULL, 1005, '2022-02-14' ),
( 5014, NULL, 2003, '2022-01-09' );
- Both approaches define CTEs
e
and p
that extend Employee
and PTO
respectively to add computed "Year"
and "Month"
columns, which avoids having to repeatedly use YEAR( "Date" ) AS "Year"
in GROUP BY
and JOIN
expressions.
- I suggest you add those as computed-columns in your base tables, if you're able, as they'll be useful generally anyway. Don't forget to index them appropriately too.
Approach 1: Composed CTEs with elementary aggregates, then UPDATE
:
WITH
-- Step 1: Extend both the `Employee` and `PTO` tables with YEAR and MONTH columns (this simplifies things later on):
e AS (
SELECT
Emp_No,
"Site",
WorkTypeId,
"Date",
YEAR( "Date" ) AS "Year",
MONTH( "Date" ) AS "Month"
FROM
Employee
),
p AS (
SELECT
Emp_No,
"Site",
WorkTypeId,
"Date",
YEAR( "Date" ) AS "Year",
MONTH( "Date" ) AS "Month"
FROM
PTO
),
-- Step 2: Get the MIN( "Date" ) value for each group:
minDatesForEachEmployeeMonthYearGroup AS (
SELECT
e.Emp_No,
e."Year",
e."Month",
MIN( "Date" ) AS "FirstDate"
FROM
e
GROUP BY
e.Emp_No,
e."Year",
e."Month"
),
-- Step 3: INNER JOIN back on `e` to get the first WorkTypeId in each group:
firstWorkTypeIdForEachEmployeeMonthYearGroup AS (
/* WARNING: This query will fail if multiple rows (for the same Emp_NO, Year and Month) have the same "Date" value. This can be papered-over with GROUP BY and MIN, but I don't think that's a good idea at all). */
SELECT
e.Emp_No,
e."Year",
e."Month",
e.WorkTypeId AS FirstWorkTypeId
FROM
e
INNER JOIN minDatesForEachEmployeeMonthYearGroup AS q ON
e.Emp_NO = q.Emp_NO
AND
e."Date" = q.FirstDate
)
-- Step 4: Do the UPDATE.
-- *Yes*, you can UPDATE a CTE (provided the CTE is "simple" and has a 1:1 mapping back to source rows on-disk).
UPDATE
p
SET
p.WorkTypeId = f.FirstWorkTypeId
FROM
p
INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON
p.Emp_No = f.Emp_No
AND
p."Year" = f."Year"
AND
p."Month" = f."Month"
WHERE
p.WorkTypeId IS NULL;
Here's a screenshot of SSMS showing the contents of the PTO
table from before, and after, the above query runs:

Approach 2: Skip the self-JOIN
with FIRST_VALUE
:
This approach gives a shorter, slightly simpler query, but requires SQL Server 2012 or later (and that your database is running in compatibility-level 110 or higher).
Surprisingly, you cannot use FIRST_VALUE
in a GROUP BY
query, despite its obvious similarities with MIN
, but an equivalent query can be built with SELECT DISTINCT
:
WITH
-- Step 1: Extend the `Employee` table with YEAR and MONTH columns:
e AS (
SELECT
Emp_No,
"Site",
WorkTypeId,
"Date",
YEAR( "Date" ) AS "Year",
MONTH( "Date" ) AS "Month"
FROM
Employee
),
firstWorkTypeIdForEachEmployeeMonthYearGroup AS (
SELECT
DISTINCT
e.Emp_No,
e."Year",
e."Month",
FIRST_VALUE( WorkTypeId ) OVER (
PARTITION BY
Emp_No,
e."Year",
e."Month"
ORDER BY
"Date" ASC
) AS FirstWorkTypeId
FROM
e
)
-- Step 3: UPDATE PTO:
UPDATE
p
SET
p.WorkTypeId = f.FirstWorkTypeId
FROM
PTO AS p
INNER JOIN firstWorkTypeIdForEachEmployeeMonthYearGroup AS f ON
p.Emp_No = f.Emp_No
AND
YEAR( p."Date" ) = f."Year"
AND
MONTH( p."Date" ) = f."Month"
WHERE
p.WorkTypeId IS NULL;
Doing a SELECT * FROM PTO
after this runs gives me the exact same output as Approach 2.
Approach 2b, but made shorter:
Just so @SOS doesn't feel too smug about their SQL being considerably more shorter than mine , the Approach 2 SQL above can be compacted down to this:
WITH empYrMoGroups AS (
SELECT
DISTINCT
e.Emp_No,
YEAR( e."Date" ) AS "Year",
MONTH( e."Date" ) AS "Month",
FIRST_VALUE( e.WorkTypeId ) OVER (
PARTITION BY
e.Emp_No,
YEAR( e."Date" ),
MONTH( e."Date" )
ORDER BY
e."Date" ASC
) AS FirstWorkTypeId
FROM
Employee AS e
)
UPDATE
p
SET
p.WorkTypeId = f.FirstWorkTypeId
FROM
PTO AS p
INNER JOIN empYrMoGroups AS f ON
p.Emp_No = f.Emp_No
AND
YEAR( p."Date" ) = f."Year"
AND
MONTH( p."Date" ) = f."Month"
WHERE
p.WorkTypeId IS NULL;
- The execution-plans for both Approach 2 and Approach 2b are almost identical, excepting that Approach 2b has an extra Computed Scalar step for some reason.
- The execution plans for Approach 1 and Approach 2 are very different, however, with Approach 1 having more branches than Approach 2 despite their similar semantics.
- But my execution-plans won't match yours because it's very context-dependent, especially w.r.t. what indexes and PKs you have, and if there's any other columns involved, etc.
Approach 1's plan looks like this:

Approach 2b's plan looks like this:

@SOS's plan, for comparison, is a lot simpler... and I honestly don't know why, but it does show how good SQL Server's query optimizer is thesedays:
