0

I would like to keep just the earliest record of every ID in a table where the dates are in yyyy-mm-dd format. If I have two or more records on the same day, I just want to take one and I do not care what of them.

I tried to join the table with itself but the left join is not working and returns more than one.

Example original table:

ID_vendor sales office dt
1 3 A 2021-10-12
1 50 B 2021-10-13
2 109 H 2021-10-13
3 110 H 2021-10-05
4 111 N 2021-10-13
4 112 W 2021-10-13
4 113 B 2021-10-13

Expected result:

ID_vendor sales office
1 3 A
2 109 H
3 110 H
4 111 N

Y try using over partition without luck, and now I am stuck here with the LEFT JOIN returning an OUTER JOIN

Any help is welcome. Here the code:

WITH t as (
    SELECT id_vendor
        , sales 
        , office 
        , min(dt) fst_date
    FROM test_table
    WHERE dt >= date('2021-09-12')
    -- AND id_vendor = '1004618231015'
    GROUP BY id_vendor, sales, office 
    ORDER BY id_vendor
)
, b AS (
SELECT id_vendor
        , sales 
        , office
        , dense_rank() over (order by fst_date) as rnk
FROM t
-- WHERE id_vendor = '1004618231015'
GROUP BY id_vendor
        , sales 
        , office
        , fst_date
        )
, c AS (
SELECT id_vendor
FROM b WHERE rnk = 1
GROUP BY id_vendor
)
, d AS (
SELECT id_vendor
    , sales
    , office
FROM b WHERE rnk = 1)
)
SELECT c.id_vendor
    , d.sales
    , d.office
FROM c
LEFT join d
    ON c.id_vendor = d.id_vendor
Pin Eipol
  • 43
  • 5
  • I have no idea why you did many select? I think you can do that by on query one your table using ``row_number`` – sa-es-ir Apr 06 '22 at 18:34
  • There is no "outer join"; there are left, right & full outer joins. Tables have no row order, result sets have row order; ORDER by without LIMIT/TOP does nothing when not at the outermost level. – philipxy Apr 06 '22 at 18:38
  • Please in code questions give a [mre]--cut & paste & runnable code & example input; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) – philipxy Apr 06 '22 at 18:39
  • "keep just the earliest record of every ID" This is a faq. Please before considering posting read the manual/reference & google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular names/strings/numbers, 'site:stackoverflow.com' & tags; read many answers. If asking reflect research. [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) PS "outer join"--cross join? PS If there are rows you don't want, don't ask for them. What condition do desired rows satisfy? – philipxy Apr 06 '22 at 18:58
  • [Fetch the row which has the Max value for a column](https://stackoverflow.com/q/121387/3404097) – philipxy Apr 06 '22 at 18:59

2 Answers2

1

Simply you can use Row_number for getting your expected result like this:

select id_vendor, sales , office from (
SELECT id_vendor
        , sales 
        , office 
        ,Row_number() over(partition by id_vendor order by dt) rw
    FROM test_table ) t
where t.rw=1
sa-es-ir
  • 3,722
  • 2
  • 13
  • 31
  • This will not result in desired output - no need to partition by `dt` – Guru Stron Apr 06 '22 at 18:46
  • @GuruStron but OP asked on the same day, what if two same id_vendor in two days? – sa-es-ir Apr 06 '22 at 18:51
  • Please read the question carefully. And check the expected output. – Guru Stron Apr 06 '22 at 18:54
  • @GuruStron OP said ``If I have two or more records on the same day, I just want to take one`` by this sentence maybe I should partition only by dt ! – sa-es-ir Apr 06 '22 at 18:58
  • 1
    _"I would like to keep just the earliest record of every ID "_. Since `dt` is date, not timestamp the part you quoted is used to "resolve" the "collision" with "use any". Please check the sample data and expected result - your query will 5 rows while 4 is expected (1 has data in multiple days) . – Guru Stron Apr 06 '22 at 19:05
  • 1
    @GuruStron Yes you are right and answer updated, Thanks – sa-es-ir Apr 06 '22 at 19:05
1

Join is not required for described task, just using row_number with partitioning by ID_vendor in subselect/cte should do the trick:

-- sample data
WITH dataset (ID_vendor, sales, office, dt) AS (
    VALUES (1, 3, 'A', date '2021-10-12'),
        (1, 50, 'B', date '2021-10-13'),
        (2, 109, 'H', date '2021-10-13'),
        (3, 110, 'H', date '2021-10-05'),
        (4, 111, 'N', date '2021-10-13'),
        (4, 112, 'W', date '2021-10-13'),
        (4, 113, 'B', date '2021-10-13')
) 

-- query
select id_vendor,
    sales,
    office
from (
        select *,
            row_number() over (partition by id_vendor order by dt) rnk
        from dataset
    )
where rnk = 1
order by id_vendor

Output:

id_vendor sales office
1 3 A
2 109 H
3 110 H
4 111 N
Guru Stron
  • 102,774
  • 10
  • 95
  • 132