1

I have a (JSON) String column like this:

[{"someValue":6,"other":true, "maybeThisValue":"exists?"},{"someValue": "cool value", "other": null}]

I'd like to end up with this:

+------------+-------+----------------+
| someValue  | other | maybeThisValue |
+------------+-------+----------------+
| 6          | true  | exists?        |
+------------+-------+----------------+
| cool value |       |                |
+------------+-------+----------------+

because the schema changes so much I would like to get the keys of the string and dynamically generate a record from it (no arrays or structs). The input JSON will always be an array of objects with single depth key-value (no childNodes).

I'm thinking maybe this isn't possible with one UDF because we have to specify the return value in advance and there could be additional columns. But maybe this is possible with two UDF? Can one UDF generate and save then execute another UDF?

Or maybe I can start with a UDF which will give me the unique keys of the (JSON) strings and then write a dbt macro which will use the return of that function call?? (I'm not sure this is possible)

I know how to do this in postgres but I'm wondering if you have any ideas to get it working in BigQuery or if it can be done in UDF that would be really cool even if it's a little slow

Converting data types is a pain but if it could detect the first 5 columns are all numbers and safe_cast to int that would be really really cool... I'm not sure how to do that though so string is fine

similar to this:

https://dba.stackexchange.com/questions/151838/postgresql-json-column-to-view

https://github.com/fishtown-analytics/dbt-utils/blob/master/macros/sql/get_column_values.sql

maybe it is possible with custom JSONPath '*.*': What JsonPath expressions are supported in BigQuery?

maybe it's not possible without array<struct> :/

maybe this is possible with BigQuery scripting?? https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting

maybe I can use this for part of it How to extract all the keys in a JSON object with BigQuery

I can get the values like this...

CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(json).map(x=>JSON.stringify(x));
"""; 

CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
        return jsonPath(JSON.parse(json), json_path);
"""
OPTIONS (
    library="gs://your_bucket/jsonpath-0.8.0.js"
);

select SPLIT(CUSTOM_JSON_EXTRACT(jsoncolumn,'*.'))
FROM `db.master`,
unnest(json2arrayUDF(JSON_EXTRACT(jsoncolumn, "$"))) jsoncolumn

it looks like:

|1 | 6 |
|  | true |
|  | exists? |
|2 | cool value |
|  |  |
|  |  |

now I just need to get the keys and then somehow match up the two things...

jaksco
  • 423
  • 7
  • 18
  • It's a good request, but right now while working in SQL you need to already have the structure of the resulting rows defined. – Felipe Hoffa Mar 05 '20 at 22:36
  • 1
    I decided to create a script which will generate the required SQL query using a lookup to the target table's jsonstring column and doing a UDF with Object.keys() – jaksco Mar 07 '20 at 07:31
  • @jaksco Would you mind sharing your script? I'm running into the exact same issue. Thanks. – Steve G Jan 05 '21 at 23:05
  • I don't have access to the script anymore but you can do something similar to this https://stackoverflow.com/questions/34890339/how-to-extract-all-the-keys-in-a-json-object-with-bigquery – jaksco Jan 07 '21 at 00:08
  • also I don't actually recommend parsing strings within BigQuery. it's too slow for Big Data. Using the STRUCTS and arrays that BQ automatically creates on import will lead you to the most performant solution. You just need to unpack them https://stackoverflow.com/questions/45865903/how-to-flatten-a-struct-in-bigquery-standard-sql – jaksco Jan 07 '21 at 00:10

0 Answers0