6

I have two tables. One is a Reference table, used to sort priority, and one is a Customer table. The Reference table is used to give priority to each column in the Customer table, to give a different order for the individual columns for a single customer.

Reference Table:

---------------------------------------
| Priority |   Attribute |  sourceID  |
---------------------------------------
|   1      |     EMAIL   |      1     |
|   2      |     EMAIL   |      2     |
|   3      |     EMAIL   |      3     |
|   2      |     NAME    |      1     |
|   1      |     NAME    |      2     |
|   3      |     NAME    |      3     |
---------------------------------------

Customer table:

-----------------------------------------------------------------------
| CustomerID |  Name   |       Email        |  SourceID |     Date    |
-----------------------------------------------------------------------
|    1       |  John   |       NULL         |     1     |  03/01/2017 |
|    1       |  NULL   |   John@email.com   |     3     |  01/01/2017 |
|    1       |   J     |  J.Smith@email.com |     2     |  02/01/2017 |
-----------------------------------------------------------------------

Result:

---------------------------------------------
| CustomerID   |  Name  |       Email       |
---------------------------------------------
|      1       |  John  | J.Smith@email.com |
---------------------------------------------

At the moment I'm using the following query to do this:

SELECT DISTINCT
       FIRST_VALUE(c.Name IGNORE NULLS) 
           OVER (PARTITION BY p.customerID 
                 ORDER BY r.PRIORITY, c.DATE 
                 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS EMAIL,
      FIRST_VALUE(c.Email IGNORE NULLS) 
           OVER (PARTITION BY c.customerID 
                 ORDER BY r.PRIORITY, c.DATE 
                 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS EMAIL
FROM Customer c
  JOIN reference r ON c.sourceID = r.sourceID;

However, this does take the different attributes for each column into consideration. I need to add a filter of some sort to each of the partitions by parts.

Can anyone assist with how I can go about doing this?

CrashBandicoot
  • 399
  • 2
  • 7
  • 21
alwaystrying
  • 163
  • 1
  • 1
  • 9

1 Answers1

5

One method is to put the attributes for a customer in a column and then recombine them:

SELECT DISTINCT customerId
       first_value(CASE WHEN ca.attribute = 'NAME' THEN ca.val end) OVER
           (PARTITION BY ca.customerId, attribute ORDER BY r.priority, ca.date) AS name,
       first_value(CASE WHEN ca.attribute = 'EMAIL' THEN ca.val END) OVER
           (PARTITION BY ca.customerId, attribute ORDER BY r.priority, ca.date) AS email
FROM ((SELECT customerId, 'NAME' AS attribute, name AS val, sourceId, date
       FROM customer c
      ) UNION ALL
      (SELECT customerId, 'EMAIL' AS attribute, email AS val, sourceId, date
       FROM customer c
      )
     ) ca JOIN
     reference r
     ON r.sourceId = ca.sourceId AND r.attribute = ca.attribute;

Note that this uses SELECT DISTINCT instead of GROUP BY. I don't think that Netezza has a first_value() aggregation function, so this construct gets around that problem.

CrashBandicoot
  • 399
  • 2
  • 7
  • 21
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786