0

I have a json message like below. I am using dbt and with Big query plug in. I need to create table dynamically in Big query

{
"data": {
"schema":"dev",
"payload": {
  "lastmodifieddate": "2022-11-122 00:01:28",
  "changeeventheader": {
    "changetype": "UPDATE",
    "changefields": [
     "lastmodifieddate",
     "product_value"
  ],
   "committimestamp": 18478596845860,
   "recordIds":[
      "568069"
    ]
},
 "product_value" : 20000
}
}
}

I need to create table dynamically with recordIds and changed fields. This field list changes dynamically whenever source sends update.. Expected output:

recordIds | product_value | lastmodifieddate     |changetype
568069    | 20000         | 2022-11-122 00:01:28 |UPDATE

Thanks for your suggestions and help!.

Chandra
  • 199
  • 2
  • 20

1 Answers1

1

JSON objects can be saved in a BigQuery table. There is no need to use dbt here.

with tbl as (select 5 row, JSON '''{
"data": {
"schema":"dev",
"payload": {
  "lastmodifieddate": "2022-11-122 00:01:28",
  "changeeventheader": {
    "changetype": "UPDATE",
    "changefields": [
     "lastmodifieddate",
     "product_value"
  ],
   "committimestamp": 18478596845860,
   "recordIds":[
      "568069"
    ]
},
 "product_value" : 20000
}
}
}''' as JS)
select *,  
JSON_EXTRACT_STRING_ARRAY(JS.data.payload.changeeventheader.recordIds) as recordIds,
JSON_EXTRACT_SCALAR(JS.data.payload.product_value) as product_value,
Json_value(JS.data.payload.lastmodifieddate) as lastmodifieddate,
Json_value(JS.data.payload.changeeventheader.changetype) as changetype
from tbl

If the JSON is saved as string in a BigQuery table, please use PARSE_JSON(column_name) to convert the string to JSON first.

Samuel
  • 2,923
  • 1
  • 4
  • 19
  • Thanks for the inputs. I see we are directly selecting “product_value", “ lastmodifieddate” instead I want to iterate “changefields” array and populate those columns dynamically as this list can vary whenever update comes. Can you suggest how to do that? – Chandra Nov 13 '22 at 10:58
  • @Chandra you can also access “changefields". This is an array of columns, but without any value. Could you please provide another sample JSON? And how it is embedded in your raw table. – Samuel Nov 13 '22 at 11:24
  • 1
    we are storing above json in raw table as a column through kafka streams. As per source , changefields only contains columns without any values but they send updated columns and its values in payload please see above payload in json). But this changefields array list vary every time whenever source sends any updates. Down the line I need to form full record in staging area by getting previous row columns and above updated column values before pushing it to next layer. So that is the reason I need to populate table dynamically with changes filed columns!. Thanks – Chandra Nov 13 '22 at 13:29
  • The array does not contain enough information to extract the values: Only the key are listed and not their hierarchy postion. For DBT: 1st get a list of all key. Please use the `UDF` from (here)[https://stackoverflow.com/questions/74358691/how-to-retrieve-the-list-of-dynamic-nested-keys-of-bigquery-nested-records/74359647#74359647] `Select distinct out from (select jsonObjectKeys(TO_JSON_STRING (your_json_column),false,true) as output2 from your table),unnest(output2) as out`. In the 2nd DBT, please query your table for each of these entries by using `Json_value(...) as ...` – Samuel Nov 13 '22 at 14:30