0

Hi I am working on the lt-pcf-analytics-exp.90676036.ga_sessions_* table and I need to extract different variables along including all variables from the nested hits columns except the columns hits.customDimensions.value and hits.customDimensions.index. I think both hits and hits.customDimensions are ARRAY. How can I do this in Standard SQL?

I found already a question on a similar problem (BigQuery except double nested column), but in my case I have a double nested array column and I am not able do adapt the code.

Basically, this is what I want to extract. How can I modify it so I will exclude hits.customDimensions.value and hits.customDimensions.index? Thank you.

SELECT fullVisitorId,
   visitId,
   visitNumber,
   cd.value as PCF_CUST_ID,
   date,
   TIMESTAMP_SECONDS(visitStartTime) as visitStartTime,
   totals.visits as visits,
   totals.hits as total_hits,
   hits.* (EXCEPT hits.customDimensions.value and hits.customDimensions.index)
FROM `lt-pcf-analytics-exp.90676036.ga_sessions_*` as t
          left join unnest(customDimensions) as cd
          left join unnest(hits) as hits
WHERE _TABLE_SUFFIX between '20210101' and '20210131' 
   and cd.index = 4 and cd.value is not null
ORDER BY PCF_CUST_ID, visitStartTime, hitNumber
Latra
  • 492
  • 3
  • 14

2 Answers2

0

If you check the BigQuery documentation for except, this is not the good syntax:

SELECT [ AS { typename | STRUCT | VALUE } ] [{ ALL | DISTINCT }]
    { [ expression. ]* [ EXCEPT ( column_name [, ...] ) ]
        [ REPLACE ( expression [ AS ] column_name [, ...] ) ]
    | expression [ [ AS ] alias ] } [, ...]

So, use it like this:

SELECT hits.* EXCEPT (value, index)
0

Like @martinus noticed, your except syntax is not correct. If you take a look to the BigQuery Documentation, you will see that the correct way to run a Query with an except is:

SELECT 
   field.* EXCEPT (nested_field1, nested_field2)
FROM `my_table`

Nevertheless, you can not directly use EXCEPT on a nested field directly. As a workaround you can exclude all the hits.customDimensions values from hits.*, and then SELECT for only hits.customDimensions.* and then exclude the nested elements that you need to remove, like index and value.

A query like the following should work:

SELECT fullVisitorId,
   visitId,
   visitNumber,
   cd.value as PCF_CUST_ID,
   date,
   TIMESTAMP_SECONDS(visitStartTime) as visitStartTime,
   totals.visits as visits,
   totals.hits as total_hits,
   hits.* EXCEPT (hits.customDimensions),
   hits.customDimensions.* EXCEPT (index, value)
FROM `lt-pcf-analytics-exp.90676036.ga_sessions_*` as t
          left join unnest(customDimensions) as cd
          left join unnest(hits) as hits
WHERE _TABLE_SUFFIX between '20210101' and '20210131' 
   and cd.index = 4 and cd.value is not null
ORDER BY PCF_CUST_ID, visitStartTime, hitNumber
Latra
  • 492
  • 3
  • 14