0

I'm trying to unnest the line_items in my table. However, I have to remove duplicates from our table and it's throwing off what should normally work.

This is the query I use to eliminate duplicates:

   SELECT *, ROW_NUMBER() OVER(PARTITION BY id) AS instance
       FROM `shopify.orders` )
      -- identify duplicate rows
   where instance = 1

However, when I try to unnest the line_items in this query, it's no longer able to partition by id

   SELECT *, ROW_NUMBER() OVER(PARTITION BY id) AS instance
       FROM `shopify.orders`, unnest(line_items) as items )
      -- identify duplicate rows
   where instance = 1

Any ideas how I can eliminate duplicates while also unnesting the line_items?

Without removing duplicates, this is what my data looks like: enter image description here

When removing duplicates, this is what it looks like: enter image description here

HossBender
  • 1,019
  • 2
  • 10
  • 23
  • 1
    Please provide sample data and desired results, so it is clear what you mean by "duplicate". – Gordon Linoff Jun 17 '20 at 16:03
  • @GordonLinoff you can see the duplicate removal a bit better here: https://stackoverflow.com/questions/36675521/delete-duplicate-rows-from-a-bigquery-table – HossBender Jun 17 '20 at 16:49

1 Answers1

2

You can first do dedup and then unnest result - something as in below example

SELECT * EXCEPT(instance, line_items) FROM (
  SELECT *, ROW_NUMBER() OVER(PARTITION BY id) AS instance
  FROM `shopify.orders` 
), UNNEST(line_items) as item
   -- identify duplicate rows
WHERE instance = 1
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230