PartiQL/SQL: JSON-SUPER array query to extract values to table on Redshift

Question

I have a somewhat complicated SUPER array that I brought in to Redshift using a REST API. The 'API_table' currently looks like this:table example

One of the sample columns "values" reads as follows:

values

[{"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},,{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"},...

I've queried the "value" data using:

SELECT c.values[0].value[0].value as v
FROM API_table c;

However, this only returns the first value "6.9" in each row and not all the "value" items in the row. The same approach doesn't work for extracting the "dateTime" items as it produced NULL values:

SELECT c.values[0].value[0].dateTime as dt
FROM API_table c;

The above example only resembles one row of the table. My question is-- are there ways to query the data in every row of the table so that all the values ("value" & "dateTime") of every row can be extracted onto a new table?

The desired result is:

v	dt
6.9	2023-01-30T17:45:00.000-05:00
6.9	2023-01-30T18:00:00.000-05:00
6.9	2023-01-30T18:15:00.000-05:00

Many thanks.

I tried the following query but it only returned singular "value' results for each row.

SELECT c.values[0].value[0].value as v
FROM API_table c;

When applied to the "dateTime" items, it yielded NULL values:

SELECT c.values[0].value[0].dateTime as dt
FROM API_table c;

===================================================================

@BillWeiner thanks, I worked through both the CTE and test case examples and got the desired results (especially with CTE). The only issue that remains is knowing how to select the original table/column that contains the entire super array so that it can be inserted into test1 (or col1 in the CTE case).

There are super arrays in every row of column 'values' so the issue remains in selecting the column 'values' and extracting each of the multiple value ("6.9") and dateTime objects from each row.

================================================================

I've managed to get the query going when the json strings are explicitly stated in the insert into test1 values query.

Now I'm running this query:

SET enable_case_sensitive_identifier TO true;

create table test1 (jvalues varchar(2048));

insert into test1 select c.values from ph_api c;

create table test2 as select json_parse(jvalues) as svalues from test1;

with recursive numbers(n) as
( select 0 as n
    union all
    select n + 1
    from numbers n
    where n.n < 20
), 
exp_top as
(   select c.svalues[n].value
    from test2 c
    cross join numbers n
)
,
exp_bot as
(   select c.value[n]
    from exp_top c
    cross join numbers n
    where c.value is not null
)
select *, value.value as v, value."dateTime" as dt
from exp_bot
where value is not null;

However, I'm getting an error--ERROR: column "jvalues" is of type character varying but expression is of type super Hint: You will need to rewrite or cast the expression. when I try to insert the source table with insert into test1 SELECT c.values from table c;

I would like to be able to SELECT this source data:

sourceinfo	variable	values
{"siteName":"YAN","siteCode":[{"value":"01"}]	{“variableCode":[{"value":"00600","network":"ID"}	[{“value":[{"value":"3.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"4.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"}]
{"siteName":"YAN","siteCode":[{"value":"01"}]	{“variableCode":[{"value":"00600","network":"ID"}	[{“value":[{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"}]

as the jvalues so that it could be unrolled into a desired result of:

v	dt
3.9	2023-01-30T17:30:00.000-05:00
4.9	2023-01-30T17:45:00.000-05:00
5.9	2023-01-30T18:00:00.000-05:00
6.9	2023-01-30T18:15:00.000-05:00

================================================================

The following query worked to select the desired json strings:

with exp_top as
(   select s.value 
    from <source_table> c, c.values s
)
select s.value, s."dateTime" from exp_top c, c.value s;

Bill Weiner · Accepted Answer · 2023-02-17T04:13:04.583

Yes. You need to expand each array element into its own row. A recursive CTE (or something similar) will be needed to expand the arrays into rows. This can be done based on the max array length in the super or with some fixed set of numbers. This set of numbers will need to be crossed joined with your table to extract each array element.

I wrote up a similar answer previously - Extract value based on specific key from array of jsons in Amazon Redshift - take a look and see if this gets you unstuck. Let me know if you need help adapting this to your situation.

==============================================================

Based on the comments it looks like a more specific example is needed. This little test case should help you understand what is needed to make this work.

I've repeated your data a few times to create multiple rows and to populate the outer array with 2 inner arrays. This hopefully show how to unroll multiple nested arrays manually (the compact Redshift unrolling method is below but hard to understand if you don't get the concepts down first).

First set up the test data:

SET enable_case_sensitive_identifier TO true;

create table test1 (jvalues varchar(2048));

insert into test1 values
('[{"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}, {"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}]'),
('[{"value":[{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"5.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"8.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}, {"value":[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T18:45:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:00:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:15:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:30:00.000-05:00"},{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T19:45:00.000-05:00"},{"value":"6.8","qualifiers":["P"],"dateTime":"2023-01-30T20:00:00.000-05:00"}]}]');

create table test2 as select json_parse(jvalues) as svalues from test1;

Note that we have to turn on case sensitivity for the session to be able to select "dateTime" correctly.

Then unroll the arrays manually:

with recursive numbers(n) as
( select 0 as n
    union all
    select n + 1
    from numbers n
    where n.n < 20
), 
exp_top as
(   select row_number() over () as r, n as x, c.svalues[n].value
    from test2 c
    cross join numbers n
)
,
exp_bot as
(   select r, x, n as y, c.value[n]
    from exp_top c
    cross join numbers n
    where c.value is not null
)
select *, value.value as v, value."dateTime" as dt
from exp_bot
where value is not null;

This version

creates the numbers 0 - 19,
expands the outer array (2 elements in each row) by cross joining with these numbers,
expands the inner array by the same method,
produces the desired results

Redshift has a built in method for doing this unrolling of super arrays and it is defined in the FROM clause. You can produce the same results from:

with exp_top as (select inx1, s.value from test2 c, c.svalues s at inx1)
select inx1, inx2, c.value[inx2] as value, s.value, s."dateTime" from exp_top c, c.value s at inx2;

Much more compact. This code has been tested and runs as is in Redshift. If you see the "dateTime" value as NULL it is likely that you don't have case sensitivity enabled.

==========================================================

To also have the original super column in the final result:

with exp_top as (select c.svalues, inx1, s.value from test2 c, c.svalues s at inx1)
select svalues, inx1, inx2, c.value[inx2] as value, s.value, s."dateTime" from exp_top c, c.value s at inx2;

==========================================================

I think that unrolling your actual data will be simpler than the code I provided for the general question.

First you don't need to use the test1 and test2 tables, you can query your table directly. If you still want to use test2 then use your table as the source of the "create table test2 ..." statement. But let's see if we can just use your source table.

with exp_top as (
  select s.value from <your table> c, c.values s
)
select s.value, s."dateTime" from exp_top c, c.value s;

This code is untested but should work.

shouldn't you vote this as duplicate, also this isn't more than a link only answer — nbk, Feb 14 '23 at 23:34
Possibly, possibly not. To someone steeped in the art will be able to become "unstuck" by looking at a previous answer that "rhymes". Someone newer to these tools will likely need more direction. I'll add some more description of the needed actions to address your concern — Bill Weiner, Feb 15 '23 at 00:12
Thanks for the helpful info. I managed to get it working when the JSON string is fully specified in: inputtext as ( select '[{"value":"6.9","qualifiers":["P"],"dateTime":"2023-01-30T17:30:00.000-05:00"},]'::text as col1 ) But not when I try specifying the table to get the full results as: inputtext as ( select c.values[0].value[0].value::text as col1 from table c ) Any direction here would be much appreciated. — camaya, Feb 15 '23 at 00:27
I added a working test case to my answer which hopefully will help you. Your comment leaves too much information out to answer directly. If you work through the example added you should see how to get your desired results. If not please provide complete test case of your data, current progress, and issue description to your original question. — Bill Weiner, Feb 15 '23 at 18:35
@BillWeiner I updated the question to reflect only what's missing to resolve the issue. — camaya, Feb 16 '23 at 16:14
That's straight forward - you just need to pass svalues up through the CTE(s). I'll append to the answer and update the compact code to show this. — Bill Weiner, Feb 16 '23 at 17:17
@BillWeiner Thanks for the update. What I meant was to find out how to insert the original table into the 'insert into test1 values' query instead of the individual json strings so that the table as a whole could be unrolled as opposed to having to move the json strings into the 'insert into test1 values' query separately. — camaya, Feb 16 '23 at 20:45
So if I'm hearing you correctly you want to directly insert the values into the super column w/o the use of the test1 table, right? One step vs. two? And this needs to be done through SQL data values, not reading data from S3, right? Would moving the first select to a CTE meet the requirement? Or am I still missing the question? — Bill Weiner, Feb 16 '23 at 20:51
@BillWeiner The idea is to insert the values from the original super table column which is a SQL data table, not S3, into the test1 table in much the same way you did in your answer with the json strings. I think moving the first select into a CTE would meet the requirement so long as that select is calling the original super table column into that CTE. — camaya, Feb 16 '23 at 21:07
Trying not to over think this. Are you looking for "INSERT into test2 SELECT from :". Is this what you are looking for? — Bill Weiner, Feb 16 '23 at 21:59
@BillWeiner I'm looking for something to select the original table, along the lines of 'insert into test1 SELECT FROM ;' so that I may use the json values in the original table and unroll them with the rest of the script you provided. — camaya, Feb 16 '23 at 22:19
So INSERT INTO test SELECT A, B, C, from ; ??? Then just carry these along through the query. If I still don't get what you are after you will want to update the question to show the source data and the desired output. — Bill Weiner, Feb 16 '23 at 22:47
@BillWeiner Ok, I updated the question because I couldn't get the SELECT to work. — camaya, Feb 17 '23 at 00:13
@BillWeiner, I finally got it to work. Thanks a lot for your help! — camaya, Feb 17 '23 at 15:52

PartiQL/SQL: JSON-SUPER array query to extract values to table on Redshift

values

1 Answers1