0

Hello folks & fellow developers,

I am working on an Oracle DB and have some XML data stored in blob format, The XML has:

a) a formula (eg. %1 - %2 , or %2||'-'||%1)

b) a delimited list of variables that fit into the formula (eg. SALE_Q1| SALE_Q2 or YEAR| MONTH)

I have managed to pull this data into 2 different columns (can be merged into 1 column if needed too) and what I need to do is get the output column as a superimposition of the variables onto the placeholders.

(eg. SALE_Q1 - SALE_Q2 , or MONTH||'-'||YEAR)

there are also a few caveats to this like:

  1. I don't know how many variables will each formula have and,
  2. the variables will not always be used in the formula in the same order as the delimited list (see eg.2)

we can consider the data coming from a query like: SELECT formula || ', ' || columns_used FROM data_table; for the input string and the output i am currently getting is like:

%1||'-'||%2||%3, SITE_NO| SITE_NAME| COUNTRY
0.1 * %1, WIND_RES
%1, TOTAL
CASE WHEN LENGTH(%1) < 8 THEN NULL ELSE TO_DATE(%1,'yyyymmdd')END, MIN_DATE
%1(+)=%3 and %2(+)=%4, ABC| LMN| PQR| XYZ

I am pretty new to SQL and this requirement is going over my head, any help will be highly appreciated. I need an SQL solution as a PL/SQL solution is not feasible in my requirement (this script is going to pull data from one DB and feed it to another repo on a regular basis)

I have seen some articles on XML table, model, or recursive regexp but I am not sure how I can use these. I have also seen this question on StackOverflow but my requirement is a little different from that and is a bit trickier as well. getting the formula string and the variables into the same string for processing like that question is also a plausible route. Any solution that can be written in an SQL query will be very helpful.

some more examples for your consideration:

"%1||'-'||%2||%3, SITE_NO| SITE_NAME| COUNTRY" => "SITE_NO||'-'||SITE_NAME||COUNTRY"

"0.1 * %1, WIND_RES" => "0.1 * WIND_RES"

"%1, TOTAL" => "TOTAL"

"CASE WHEN LENGTH(%1) < 8 THEN NULL ELSE TO_DATE(%1,'yyyymmdd')END, MIN_DATE" => "CASE WHEN LENGTH(MIN_DATE) < 8 THEN NULL ELSE TO_DATE(MIN_DATE,'yyyymmdd')END"

"%1(+)=%3 and %2(+)=%4, ABC| LMN| PQR| XYZ" => "ABC(+)=PQR and LMN(+)=XYZ"
MT0
  • 143,790
  • 11
  • 59
  • 117
  • You appear to be building something that will build dynamic SQL; this looks like a recipe for making your code vulnerable to SQL injection and that is almost certainly not something that you want to put into a production system. Also, your last example looks very much like you are using Oracle's legacy comma join syntax; please migrate to using ANSI joins. – MT0 Mar 24 '21 at 09:33
  • Also, you appear to be using `||` as the string concatenation operator and `|` as your delimiter. What logic is there to ensure that the `|` characters retain their appropriate meaning? Is it that the `|` before a comma are concatenation operators and the `|` after the comma are always delimiters? Can you ensure that there is never going to be two commas? – MT0 Mar 24 '21 at 09:36
  • @MT0 the script is not going to be exposed to any client end and the data input is not coming from a transactional DB, this is a script is part of a system to get a lot of aggregated data to a DB for analysis. The formulee are coming from a reporting tool and hence the sntax looks legacy (i can't change that). I can however change the delimiter, all suggestions are available ", " or "|" "/" or any symbol that you can suggest – kunal kavthekar Mar 24 '21 at 09:46
  • @MT0, the input string and the list of variables can also be used as 2 different columns if that would the task easier – kunal kavthekar Mar 24 '21 at 09:51

1 Answers1

1

I am assuming that the last comma is the delimiter between the template string and the delimited terms. You can use a recursive sub-query factoring clause and simple string functions:

WITH terms ( value, terms, num_terms ) AS (
  SELECT SUBSTR( value, 1, INSTR( value, ', ', -1 ) - 1 ),
         SUBSTR( value, INSTR( value, ', ', -1 ) + 2 ),
         REGEXP_COUNT(
           SUBSTR( value, INSTR( value, ', ', -1 ) + 2 ),
           '.+?(\| |$)'
         )
  FROM   table_name
),
term_bounds ( rn, value, terms, start_pos, lvl ) AS (
  SELECT ROWNUM,
         REPLACE(
           value,
           '%' || num_terms,
           CASE num_terms
           WHEN 1
           THEN terms
           ELSE SUBSTR( terms, INSTR( terms, '| ', 1, num_terms - 1 ) + 2 )
           END
         ),
         terms,
         CASE
         WHEN num_terms > 1
         THEN INSTR( terms, '| ', 1, num_terms - 1 )
         ELSE 1
         END,
         num_terms
  FROM   terms
UNION ALL
  SELECT rn,
         REPLACE(
           value,
           '%' || (lvl - 1),
           CASE lvl - 1
           WHEN 1
           THEN SUBSTR( terms, 1, start_pos - 1 )
           ELSE SUBSTR(
                  terms,
                  INSTR( terms, '| ', 1, lvl - 2 ) + 2,
                  start_pos - INSTR( terms, '| ', 1, lvl - 2 ) - 2
                )
           END
         ),
         terms,
         CASE
         WHEN lvl > 2
         THEN INSTR( terms, '| ', 1, lvl - 2 )
         ELSE 1
         END,
         lvl - 1
  FROM   term_bounds
  WHERE  lvl > 1
)
SEARCH DEPTH FIRST BY rn SET rn_order
SELECT value
FROM   term_bounds
WHERE  lvl = 1;

Which, for the sample data:

CREATE TABLE table_name ( value ) AS
SELECT '%1||'-'||%2||%3, SITE_NO| SITE_NAME| COUNTRY' FROM DUAL UNION ALL
SELECT '0.1 * %1, WIND_RES' FROM DUAL UNION ALL
SELECT '%1, TOTAL' FROM DUAL UNION ALL
SELECT 'CASE WHEN LENGTH(%1) < 8 THEN NULL ELSE TO_DATE(%1,'yyyymmdd')END, MIN_DATE' FROM DUAL UNION ALL
SELECT '%1(+)=%3 and %2(+)=%4, ABC| LMN| PQR| XYZ' FROM DUAL UNION ALL
SELECT '%1, %2, %3, %4, %5, %6, %7, %8, %9, %10, %11, ONE| TWO| THREE| FOUR| FIVE| SIX| SEVEN| EIGHT| NINE| TEN| ELEVEN' FROM DUAL UNION ALL
SELECT '%%%%%%%7, HELLO| 1| 2| 3| 4| 5| 6' FROM DUAL

Outputs:

| VALUE                                                                                      |
| :----------------------------------------------------------------------------------------- |
| SITE_NO||'-'||SITE_NAME||COUNTRY                                                 |
| 0.1 * WIND_RES                                                                             |
| TOTAL                                                                                      |
| CASE WHEN LENGTH(MIN_DATE) < 8 THEN NULL ELSE TO_DATE(MIN_DATE,'yyyymmdd')END |
| ABC(+)=PQR and LMN(+)=XYZ                                                                  |
| ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, ELEVEN                          |
| HELLO                                                                                      |

db<>fiddle here


Or as a PL/SQL function:

CREATE FUNCTION substitute_values(
  i_value          IN VARCHAR2,
  i_terms          IN VARCHAR2,
  i_term_delimiter IN VARCHAR2 DEFAULT '| '
) RETURN VARCHAR2 DETERMINISTIC
IS
  TYPE term_list_type IS TABLE OF VARCHAR2(200);
  v_start  PLS_INTEGER := 1;
  v_end    PLS_INTEGER;   
  v_index  PLS_INTEGER;
  v_terms  term_list_type := term_list_type();
  v_output VARCHAR2(4000) := i_value;
BEGIN
  LOOP
    v_end := INSTR(i_terms, i_term_delimiter, v_start);
    v_terms.EXTEND;
    IF v_end > 0 THEN
      v_terms(v_terms.COUNT) := SUBSTR(i_terms, v_start, v_end - v_start);
    ELSE
      v_terms(v_terms.COUNT) := SUBSTR(i_terms, v_start);
      EXIT;
    END IF;
    v_start := v_end + LENGTH(i_term_delimiter);
  END LOOP;
  
  LOOP
    v_index  := TO_NUMBER(REGEXP_SUBSTR(v_output, '^(.*?)%(\d+)', 1, 1, NULL, 2));
    IF v_index IS NULL THEN
      RETURN v_output;
    ELSIF v_index > v_terms.COUNT THEN
      RETURN NULL;
    END IF;
    v_output := REGEXP_REPLACE(v_output, '^(.*?)%(\d+)', '\1' || v_terms(v_index));
  END LOOP;
END;
/

Then:

SELECT SUBSTITUTE_VALUES(
         SUBSTR(value, 1, INSTR(value, ', ', -1) - 1),
         SUBSTR(value, INSTR(value, ', ', -1) + 2)
       ) AS value
FROM   table_name

Outputs:

VALUE
SITE_NO||'-'||SITE_NAME||COUNTRY
0.1 * WIND_RES
TOTAL
CASE WHEN LENGTH(MIN_DATE) < 8 THEN NULL ELSE TO_DATE(MIN_DATE,'yyyymmdd')END
ABC(+)=PQR and LMN(+)=XYZ
ONE, TWO, THREE, FOUR, FIVE, SIX, SEVEN, EIGHT, NINE, TEN, ELEVEN
HELLO

db<>fiddle here

MT0
  • 143,790
  • 11
  • 59
  • 117
  • I ran your query and it ran like a charm but I have a corner case that I would like to ask about **how can I incorporate double-digit numbers for place holders like %15 or %21** – kunal kavthekar Mar 24 '21 at 15:17
  • 1
    @kunalkavthekar Updated to start at the end and work backwards from the highest number to the lowest. – MT0 Mar 24 '21 at 21:03
  • thanks a lot, that did the trick, and sorry for the late reply – kunal kavthekar Mar 29 '21 at 05:41
  • thanks for helping me with this query, we have been using it for a while now but now they have come back to me asking if it is possible to write this query without the "union all" and recursion (i tried explaining that it is required). I have been thinking that as it is in Oracle, is it possible to write it using the oracle "connect by" structure? if so could you please help me with that? – kunal kavthekar Jul 14 '21 at 05:51
  • @MTO can you please help me with this? I need a solution to this string substitution problem that works without a "union all" or recursion as it is in its current form – kunal kavthekar Jul 14 '21 at 15:41
  • @kunalkavthekar You could try writing a PL/SQL function (or a Java function) to do the substitution and then invoke that in the SQL query. – MT0 Jul 14 '21 at 21:28
  • @MTO, I can't use Java for this. my only option is a query or a PL/SQL program in Oracle. Can you please elaborate on how can this be done in PL/SQL without recursion? The main reason to look for such a solution is that this query is being run daily on a few million records and even though it is working correctly, we see a performance issue as the DB scales up in size – kunal kavthekar Jul 15 '21 at 02:54
  • Create a function to: split the replacement terms up and store them as a collection (or change the database to store a collection of terms in a nested table and then pass that directly into the function and eliminate the need for repeated parsing); then iterate over the string using a regular expression to match `^(.*?)%(\d+)` (or something similar) and replace the placeholder with the term; finally, when there are no more replacements then return the updated string. – MT0 Jul 15 '21 at 08:43
  • You can do it in Java inside the database using a `CREATE JAVA` statement and then wrapping it in a PL/SQL function; you may find that you can write the regular expression more efficiently in Java and it may end up being faster but you should profile that first. – MT0 Jul 15 '21 at 08:46
  • @MTO, So are you saying like writing a java program inside a PL/SQL block (`CREATE JAVA`) that takes 2 string arguments (the expression string and the set of replacement terms). Next split the replacement terms and use a loop construct to replace the placeholders into the expression and return the expression. Finally, call this function for each row in the database? – kunal kavthekar Jul 15 '21 at 09:44
  • I have not done any CREATE JAVA stuff before but I'll try, any tips or resources would be highly appreciated, but thanks for your suggestion, I will try this – kunal kavthekar Jul 15 '21 at 09:46
  • Regarding using Java, you can [search my posts for `[oracle] create java`](https://stackoverflow.com/search?q=user%3A1509264+%5Boracle%5D+create+java) for examples from a simple [hello world function](https://stackoverflow.com/a/60600349/1509264) to a more complex program for [reading from zipped blobs](https://stackoverflow.com/a/59250531/1509264). – MT0 Jul 15 '21 at 10:10
  • @MTO, is there any other way to do this in a query? I will pitch this method to my team but I would like to know about the options or why this can't be done with a query (any other way) – kunal kavthekar Jul 15 '21 at 10:53
  • 1
    No, you need to repeatedly apply the replacement and this can only be done by repeatedly iterating over the string. You can do that in SQL, PL/SQL (examples of both in the question) or in another language such as Java which can be accessed from the database. The only other improvement I can suggest is to calculate the replacement text when the values are inserted/updated (either using a trigger to populate another column/table or in a materialized view) so that you do not have to regenerate the text with every `SELECT`. – MT0 Jul 15 '21 at 11:18