-1

I want everyone in the list of this temporary table and include the email address from this other table; if there is no email, show null. I also want to see their respective language if available; if not, null.

I keep getting a higher number of people which I think are repeats.

SELECT temp.*,
   a.email_addr,
   a.ibe_3103
FROM cigna_shared.tempoarytable temp
LEFT JOIN (
    SELECT DISTINCT poli.account_number,
        info.language, em.email_address
    FROM policytable poli
    LEFT JOIN languagetable info
    ON poli.id = info.id
    LEFT JOIN emailtable em
    ON poli.KEY = em.KEY
    ) a
ON temp.memberid = a.account_number; 

The total amount of people in the temporary table (76,815) and i'm getting 85k back. I can see that there are multiple poli acccount numbers in the poli table, so the email can be linked to 3 poli account numbers. i'm trying to figure out how to make it so that it all totals 76,815 with the added email address and language.

Where am I causing the replication?

philipxy
  • 14,867
  • 6
  • 39
  • 83
Rafa579
  • 19
  • 2
  • 7

1 Answers1

1

I suggest you need to - somehow - incorporate use of row_number() into your subquery, the following is an example only:

SELECT TEMP.*
    , a.email_addr
    , a.ibe_3103
FROM cigna_shared.tempoarytable TEMP
LEFT JOIN (
    SELECT poli.account_number
        , info.LANGUAGE
        , em.email_address
        , row_number() OVER (
            PARTITION BY poli.account_number ORDER BY em.email_address
            ) AS rn
    FROM policytable poli
    LEFT JOIN languagetable info ON poli.id = info.id
    LEFT JOIN emailtable em ON poli.KEY = em.KEY
    ) a ON TEMP.memberid = a.account_number
    AND rn = 1;

If you have multiple rows for an account number, and some are unwanted, you need to decide what combination of column values to use to determine which ones are used and which are not. Do this through the "over clause" in the "partitioon by" and "order by" (nb: the column references I have used above are simply a guess). Once that partition/order logic is acceptable to you then only choose the rows that have a row number of 1 via the join condition - and hence you avoid multiplying the rows of your temp table.

Also, it might be better to apply this row_number() based logic on only the policy table (i.e. as a subquery) if that is the only source of the unwanted rows. For example, let's say there is a date column in the policy table, and we camn use this to determine which policy is the most recent. This way we get only one row per person from the policy table e.g:

WITH
    cte AS (
        SELECT
            poli.id
          , poli.key
          , poli.account_number
          , ROW_NUMBER() OVER (
                PARTITION BY
                    poli.account_number
                ORDER BY
                    poli.created_date DESC -- most recent policy?
            ) AS rn
        FROM
            policytable poli
    )
SELECT
    TEMP.*
  , em.email_addr
  , info.ibe_3103
FROM
    cigna_shared.tempoarytable TEMP
    LEFT JOIN cte ON TEMP.memberid = cte.account_number AND cte.rn = 1
    LEFT JOIN languagetable info ON cte.id = info.id
    LEFT JOIN emailtable em ON cte.KEY = em.KEY
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51