2

I am looking for a custom function that can accept a STRUCT and return some string output of it. For this example, let us just assume the types of the struct may be a STRING, INT64, or a DATE. So we have something like this:

STRUCT(1 AS num, "hi" as str, DATE "2014-01-01" as date)

And in the output I would like to produce the following JSON obj, which is a list of triples -- [key, value, type], with the value being the json-encoded value.

[
    ["num", 1, "INT64"],
    ["str", "hi", "STRING"],
    ["date", "2014-01-01", "DATE"]
]

How could I do this in BigQuery? So far I have some scaffolding down, but not quite sure how to get the "inspect" for the type or key of a value.

CREATE TEMP FUNCTION structFunc(obj ANY TYPE)
RETURNS json
AS (
  JSON "[]"
);

SELECT
  structFunc(STRUCT(1 as num,"x" as str, date "2014-01-01" as date))

I'm hoping there is something like a FOR loop I can iterate over the k-v pairs.


Note, here is a related question -- How to extract all the keys in a JSON object with BigQuery, that shows how to get an arbitrary list of k,v pairs in a js udf. However, it has the following two limitations:

  1. JS requires an input type. It cannot allow ANY TYPE, and you do you not know the struct beforehand.
  2. If you convert to json you lose non-json types unless you preserve additional information someplace else. For example DATE "2014-01-01" and "2014-01-01" would both be encoded the same in JSON even if they are two different types.

Additionally, if this is an easier way to solve this. I am fine having the STRUCT as part of a stored table if it inspecting the INFORMATION_SCHEMA makes this simpler to solve. For example:

CREATE TABLE `my-dataset.123`.tbl AS (
    SELECT STRUCT(1 as num,"x" as str, date "2014-01-01" as date) as val
)

In this way perhaps we could create a js udf that has a signature something like this:

CREATE TEMP FUNCTION structFunc(obj ANY TYPE)
RETURNS json
LANGUAGE js
AS r"""
    function toTriples(
        obj_as_json:JSON, 
        metadata_from_information_schema:JSON
    ) {...}
""";

And here are two additional helpers to get the value as well as the info :

-- CREATE TABLE `my-dataset.123`.tbl AS (SELECT STRUCT(1 as num,"x" as str, date "2014-01-01" as date) as val);

-- Get the value
select to_json(val) from `my-dataset.123`.tbl

-- Get the info (array<field_path STRING, data_type STRING>)
select array(SELECT as struct
  field_path, data_type
FROM
  `my-dataset.123`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE
  table_name = 'tbl'
  AND column_name = 'val')
David542
  • 104,438
  • 178
  • 489
  • 842
  • Sorry the answer was messed up. Deleted. – Mehmet Kaplan May 31 '23 at 01:03
  • @MehmetKaplan -- I see, how were you able to enter in that function in BQ? Or was it another product or something? I was so excited when I saw your answer, I didn't think that was possible in BQ... – David542 May 31 '23 at 01:06

1 Answers1

0

To achieve the desired output of generating JSON with the key-value pairs and their corresponding types from a STRUCT in BigQuery, you can use a combination of SQL and JavaScript user-defined functions (UDFs). Here's an example implementation:

First, let's create a UDF in JavaScript that takes a STRUCT as input and returns the desired JSON output:

CREATE TEMP FUNCTION structFunc(obj ANY TYPE) RETURNS STRING LANGUAGE js AS """
      var result = [];
      for (var key in obj) {
        var value = obj[key];
        var type = typeof value;
        if (type === 'object' && value instanceof Date) {
          type = 'DATE';
          value = value.toISOString().substring(0, 10);
        }
        result.push([key, value, type]);
      }
      return JSON.stringify(result);
    """;

This UDF iterates over each key-value pair in the input STRUCT, determines the type of each value, and converts the value to the appropriate format if it's a Date. Finally, it returns the JSON-encoded result.

Now, you can use this UDF in your query to generate the desired JSON output:

SELECT structFunc(STRUCT(1 AS num, "hi" AS str, DATE "2014-01-01" AS date)) AS output;

The result will be a single row with the JSON-encoded output:

[["num",1,"INT64"],["str","hi","STRING"],["date","2014-01-01","DATE"]]

This approach should handle the different types (INT64, STRING, and DATE) and generate the desired JSON output.

Note: The example assumes that the STRUCT you pass to the UDF always contains valid values of the specified types. Additional validation or error handling may be needed depending on your specific use case.

M Y
  • 1,831
  • 4
  • 24
  • 52
  • M Y -- thanks, sorry I didn't award the bounty to you in time, I was going to and then it seems that it had passed the grace period. – David542 Jun 03 '23 at 02:14
  • It's fine, did it work for you? – M Y Jun 03 '23 at 03:31
  • 1
    Yes, though actually the code I ended up using was about a 250-line js udf. The problem with the above is it only accepts the supported types by BQ/JS udf. For example, DATETIME/TIMESTAMP/BYTES etc. etc. wouldn't have js-converted types, so I built something like a parser to do this (and also to handle recursion). – David542 Jun 03 '23 at 22:11
  • If you have time to put an example of this parser to this answer could help people in the future trying to do the same. – M Y Jun 05 '23 at 17:46
  • 1
    Looks like ChatGPT – DavidW Jun 09 '23 at 14:17