3

I'm trying to flatten data from repeated fields in Big Query. I have had a look at this Querying multiple repeated fields in BigQuery, however I can't seem to get this to work.

My data looks like the following:

[
  {
    "visitorId": null,
    "visitNumber": "15",


    "device": {
      "browser": "Safari (in-app)",
      "browserVersion": "(not set)",
      "browserSize": "380x670",
      "operatingSystem": "iOS",
    },
    "hits": [
      {
        "isEntrance": "true",
        "isExit": "true",
        "referer": null,
        "page": {
          "pagePath": "/news/bla-bla-bla",
          "hostname": "www.example.com",
          "pageTitle": "Win tickets!!",
          "searchKeyword": null,
          "searchCategory": null,
          "pagePathLevel1": "/news/",
          "pagePathLevel2": "/bla-bla-bla",
          "pagePathLevel3": "",
          "pagePathLevel4": ""
        },
        "transaction": null
      }
    ]
  }
]

What I want is the fields in the hits-page repeated fields.

For instance i want to fetch the hits.page.pagePath (with the value "/news/bla-bla-bla")

I have tried with the following query, but i get an error:

SELECT 
visitorId,
visitNumber,
device.browser,
hits.page.pagePath
FROM 
`Project.Page`
LIMIT 1000

The error i'm getting is this

Error: Cannot access field page on a value with type ARRAY<STRUCT<hitNumber INT64, time INT64, hour INT64, ...>> 
Tiawy
  • 175
  • 5
  • 11

1 Answers1

6

In ga_sessions schema, the field hits is represented as an ARRAY type.

Usually when working with this type field you need to apply the UNNEST operation in order to open the array.

Specifically, in the FROM clause, you can apply a CROSS JOIN (you unnest arrays by applying a cross join operation, which can be represented as a comma followed by the UNNEST function) like so:

SELECT 
  visitorId,
  visitNumber,
  device.browser,
  hits.page.pagePath
FROM `Project.Page`,
UNNEST(hits) hits
LIMIT 1000

If you want specific pagePaths, you can filter them out like so:

SELECT 
  visitorId,
  visitNumber,
  device.browser,
  hits.page.pagePath
FROM `Project.Page`,
UNNEST(hits) hits
WHERE regexp_contains(hits.page.pagePath, r'/news/bla-bla-bla')
LIMIT 1000

Make sure to follow through BigQuery documentation on this topic, it's really well written and you'll learn a lot on new techniques to process big data.

Willian Fuks
  • 11,259
  • 10
  • 50
  • 74
  • This [blog post](https://firebase.googleblog.com/2017/03/bigquery-tip-unnest-function.html) gives more information. – Steren Dec 22 '17 at 03:54