1

I have a column in my BigQuery with various different messages in a simple one-depth JSON format which I would like to extract into a STRUCT. The input table looks like

Input table

and should be transformed into

Desired Output

I am aware of BigQuery json-functions such as JSON_EXTRACT as posted for example here. However, this approach is out of question since in production there like 100 different senders. Therefore, I need to be able to extract those JSONs dynamically without specifying their keys manually.

I have been playing around with regex as shown here

    WITH input_table AS (
    SELECT
        1 AS Row,
        20210101 AS Date,
        'Sender1' AS Sender,
        '{"param1": 123, "param2": 456, "param3": 78,  "value1": 42, "label1": "hello", "timestamp": 1234567890}' AS Message
    UNION ALL SELECT
        2 AS Row,
        20210101 AS Date,
        'Sender2' AS Sender,
        '{"value1": 4, "label1": "myLabel", "label2": "yourLabel"}' AS Message
    UNION ALL SELECT
        3 AS Row,
        20210102 AS Date,
        'Sender1' AS Sender,
        '{"param1": 12, "param2": 90, "param3": 55,  "value1": 11, "label1": "there", "timestamp": 1235555555}' AS Message
    )
    SELECT
    CONCAT("SELECT ", key, " AS key, JSON_EXTRACT_SCALAR(Message, '$.", key, "') AS ", key, " FROM input_table")
    FROM input_table, unnest(regexp_extract_all(regexp_replace(JSON_EXTRACT(Message, '$'), r':{.*?}+', ''), r'"(.*?)":')) key

but apart from the fact that my regex is still a bit off, I am struggling to convert these statements into a STRUCT.

Majid Hajibaba
  • 3,105
  • 6
  • 23
  • 55
CaptainNabla
  • 563
  • 1
  • 4
  • 12

1 Answers1

1

This is a very tricky question AFAIK it's not practically possible to populate dynamic params into the RECORD STRUCT as its sub-schema must be predefined on table/query level. I have followed something similar to firebase bigquery event/user parameter schema (with key & value pairs) using javascript function for you to get started with.

CREATE TEMP FUNCTION processJson(input STRING)
RETURNS ARRAY<STRUCT<key STRING, value STRING>>
LANGUAGE js AS """
var obj = JSON.parse(input);
var keys = Object.keys(obj);
var arr = [];
for (i = 0; i < keys.length; i++) {
    arr.push({'key': keys[i], 'value': JSON.stringify(obj[keys[i]])});
}
return arr;
""";
WITH
  input_table AS (
  SELECT
    1 AS ROW,
    20210101 AS Date,
    'Sender1' AS Sender,
    '{"param1": 123, "param2": 456, "param3": 78,  "value1": 42, "label1": "hello", "timestamp": 1234567890}' AS Message
  UNION ALL
  SELECT
    2 AS ROW,
    20210101 AS Date,
    'Sender2' AS Sender,
    '{"value1": 4, "label1": "myLabel", "label2": "yourLabel"}' AS Message
  UNION ALL
  SELECT
    3 AS ROW,
    20210102 AS Date,
    'Sender1' AS Sender,
    '{"param1": 12, "param2": 90, "param3": 55,  "value1": 11, "label1": "there", "timestamp": 1235555555}' AS Message )
SELECT
  ROW,
  date,
  sender,
  processJson(Message) message
FROM
  input_table

enter image description here

Simple query to filter:

SELECT
  *
FROM
  final_view , unnest(message) message
  WHERE message in (('param1', '123'), ('param2', '90'), ('value1', '4'))
Logan
  • 1,331
  • 3
  • 18
  • 41
  • I feared it isn't that simple.. I think I'll try and shift that issue to Python during the ETL process. But thanks anyway! :) – CaptainNabla Aug 11 '21 at 05:23