0

I want to store raw API responses in Clobs, and through a procedure parse the json contained in the Clobs in a generic manner - meaning I have to construct the json-format dynamically. One method of doing this would be to regex the first Key:Value pairs, and extract one line of all the Keys as the Column Headers. These Column Headers could then be used in some way to dynamically create the json-format, and the rest of the json in the given clob could then be parsed.

I have this standard snippet here for parsing a known json-string:

SELECT * FROM json_table((select json_response from Some_Table where Condition=#input_parameter#), '$[*]'
                 COLUMNS 
                                      column1 varchar(256) PATH '$.column1',
                                      column2 varchar(256) PATH '$.column2',
                                      column3 varchar(256) PATH '$.column3',
                   )

Here, column1, column2, and column3 are known and defined. Say that a different Clob is selected, with additional columns - I would need to regex the key-values from the json as column headers, then dynamically create the column definition seen above (standard type varchar(256) to keep it simple).

The Json within the clob is a list of non-nested json-strings. The json has this format, always:

[{"column1":value1,"column2":"value2","column3":value3}, {...}, {...}, ...]

So the regex would capture all key values between the first {}-pairs, and extract the values of all fields to the left of the semi-colon separating key-value pairs.

Going from here, how can I dynamically create the column definition from the regex results?

The given example data above provides this table with the above query:

<table style="width: 5%;" border="2" cellpadding="5">
  <tbody>
    <tr>
      <td>&nbsp;column1&nbsp;</td>
      <td>&nbsp;column2&nbsp;</td>
      <td>&nbsp;column3&nbsp;</td>
    </tr>
    <tr>
      <td>&nbsp;value1</td>
      <td>&nbsp;value2</td>
      <td>&nbsp;value3</td>
    </tr>
  </tbody>
</table>

Given a clob containing any number of columns, I need to extract the columns in a similar way and create a table, like this:

<table style="width: 5%;" border="2" cellpadding="5">
  <tbody>
    <tr>
      <td style="width: 20%;">&nbsp;column1</td>
      <td style="width: 16.2539%;">column2&nbsp;</td>
      <td style="width: 16.7461%;">column3&nbsp;</td>
      <td style="width: 17%;">...&nbsp;</td>
      <td style="width: 19%;">column_n&nbsp;</td>
    </tr>
    <tr>
      <td style="width: 20%;">&nbsp;value1</td>
      <td style="width: 16.2539%;">&nbsp;value2</td>
      <td style="width: 16.7461%;">&nbsp;value3</td>
      <td style="width: 17%;">...</td>
      <td style="width: 19%;">&nbsp;value</td>
    </tr>
    <tr>
      <td style="width: 20%;">&nbsp;...</td>
      <td style="width: 16.2539%;">&nbsp;...&nbsp;</td>
      <td style="width: 16.7461%;">&nbsp;...</td>
      <td style="width: 17%;">...&nbsp;</td>
      <td style="width: 19%;">&nbsp;...</td>
    </tr>
    <tr>
      <td style="width: 20%;">&nbsp;value</td>
      <td style="width: 16.2539%;">&nbsp;value</td>
      <td style="width: 16.7461%;">&nbsp;value</td>
      <td style="width: 17%;">...&nbsp;</td>
      <td style="width: 19%;">&nbsp;value</td>
    </tr>
  </tbody>
</table>
bullfighter
  • 397
  • 1
  • 4
  • 21

1 Answers1

1

Do not use regular expressions to parse JSON; use a proper JSON parser.

From this answer, you can write the functions:

CREATE FUNCTION get_keys(
  value IN CLOB
) RETURN SYS.ODCIVARCHAR2LIST PIPELINED
IS
  js   JSON_OBJECT_T := JSON_OBJECT_T( value );
  keys JSON_KEY_LIST;
BEGIN
  keys := js.get_keys();
  FOR i in 1 .. keys.COUNT LOOP
    PIPE ROW ( keys(i) );
  END LOOP;
END;
/

CREATE FUNCTION get_value(
  value IN CLOB,
  path  IN VARCHAR2
) RETURN VARCHAR2
IS
  js JSON_OBJECT_T := JSON_OBJECT_T( value );
BEGIN
  RETURN js.get_string( path );
END;
/

Then use the query:

SELECT t.id,
       jt.idx,
       k.COLUMN_VALUE AS Key,
       get_value( jt.json_obj, k.COLUMN_VALUE ) AS value
FROM   TABLE_NAME t
       CROSS APPLY JSON_TABLE(
         t.json_value,
         '$[*]'
         COLUMNS (
           idx FOR ORDINALITY,
           json_obj VARCHAR2(4000) FORMAT JSON PATH '$'
         )
       ) jt
       CROSS APPLY get_keys( jt.json_obj ) k

Which, for the sample data:

CREATE TABLE table_name (
  id         NUMBER
             GENERATED ALWAYS AS IDENTITY
             PRIMARY KEY,
  json_value CLOB
             CHECK( json_value IS JSON )
);

INSERT INTO table_name ( json_value ) VALUES (
'[{"column1":"value1","column2":"value2","column3":"value3"},
{"column1":"value4","column2":"value5","column3":"value6"},
{"column3":"value9","column1":"value7","column2":"value8"}]'
)

Outputs:

ID IDX KEY VALUE
1 1 column1 value1
1 1 column2 value2
1 1 column3 value3
1 2 column1 value4
1 2 column2 value5
1 2 column3 value6
1 3 column3 value9
1 3 column1 value7
1 3 column2 value8

db<>fiddle here

Going from here, how can I dynamically create the column definition from the [...] results?

You want a dynamic pivot; this is not something that Oracle (or most RDBMS) support and you are better performing the transposition of rows into columns in whatever middle tier programming application (Java, PHP, C#, Python, etc.) you are using to access the database.


If you know there are going to be 3 keys then you could use:

SELECT *
FROM   (
  SELECT t.id,
         jt.idx,
         k.COLUMN_VALUE AS Key,
         get_value( jt.json_obj, k.COLUMN_VALUE ) AS value,
         ROW_NUMBER() OVER (PARTITION BY t.id, jt.idx ORDER BY k.COLUMN_VALUE)
           AS key_index
  FROM   TABLE_NAME t
         CROSS APPLY JSON_TABLE(
           t.json_value,
           '$[*]'
           COLUMNS (
             idx FOR ORDINALITY,
             json_obj VARCHAR2(4000) FORMAT JSON PATH '$'
           )
         ) jt
         CROSS APPLY get_keys( jt.json_obj ) k
)
PIVOT (
  MAX(key) AS key,
  MAX(value) AS value
  FOR key_index IN (1, 2, 3)
)

Which outputs:

ID IDX 1_KEY 1_VALUE 2_KEY 2_VALUE 3_KEY 3_VALUE
1 1 column1 value1 column2 value2 column3 value3
1 2 column1 value4 column2 value5 column3 value6
1 3 column1 value7 column2 value8 column3 value9

Note: You cannot dynamically set the column names to the JSON keys and you cannot get it to work for a dynamic number of columns.

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
  • This came pretty close to extracting what I needed. However; I need the columns to remain in a tabular form, not pivot the columns to rows. With the code provided in the OP, I can convert the json string in the Clob to a table, but I need to do the same dynamically for any json that is passed as a clob (and where I might not know the column definitions). How would the above look without a pivot? – bullfighter Aug 24 '21 at 12:57
  • @bullfighter See the last paragraph; it is not possible in Oracle to do a dynamic pivot and you should transpose rows to columns using the middle-tier application. *Note: There is no pivot (or unpivot) in the query.* – MT0 Aug 24 '21 at 13:14
  • @bullfighter I have added an option for a known, fixed number of keys. – MT0 Aug 24 '21 at 13:24
  • thank you for taking the time to answer and explain. It's really regrettable that Oracle is missing such a feature :( Is there perhaps a "hacky" variant, ie. by constructing a query string and executing it using Execute Immediate? I'm imagining: 1) Retrieve string of keys using json-lib 2) Loop over the string elements, using concat to construct the column definition and add it to a main string containing all the column definitions 3) execute an altered version of the query in OP through execute immediate by constructing the query string – bullfighter Aug 24 '21 at 13:35
  • The alternative would be to create dozens of individualized stored procedures that can capture each individual json string format possibility from the source, and that sounds both dangerous and messy. Sadly I can't use Python or something else, as the data is loaded by an ETL tool (with depreciated json-parsing), and therefore the conversion from json to a tabular format has to be done in oracle – bullfighter Aug 24 '21 at 13:37
  • @bullfighter It is not a limitation of Oracle, it is a limitation of SQL as you **MUST** know all the columns for each `SELECT` clause before you run a query. Yes, you could dynamically build the query using string concatenation in PL/SQL and then use `EXECUTE IMMEDIATE` to run it; however, its not a solution that I would like to build so I'll leave that to you to complete. – MT0 Aug 24 '21 at 18:13
  • Thank you very much for all the good answers sir! – bullfighter Aug 24 '21 at 18:17