1



I wrote the stored procedure below for the purposes of reading a comma delimited file that is moved onto the server ("DIR" folder), and when script is executed, it essentially parses the file (.csv), and assigns the data to its respective variables (xJOB_ID, xCTRL_ID, xACCT_SEC, xCREATEDON_DATE) such that I can insert the data into a table.

I am using Oracle SQL Developer version 4.0.0.13 in a Windows 7 environment. Fortunately, after banging my head on the table a couple of times the code works and I have not had any issues running the script.

Example format of the file:
1111, 2, T, 10/10/2000
2222, 12345, U, 10/10/2001
5555, 123, S, 10/10/1999

MY QUESTION: I found a little difficulty using the SUBSTRING & INSTRING functions to parse the data and wanted to know how can I improve the script so that in the event some debugging was needed, it can be easily resolved for someone that did not write the stored procedure.

Please let me know if that makes sense. I gave you the entire script so that you can understand what I was trying to accomplish and so that I can improve the code for debugging purposes.

create or replace PROCEDURE SP_INSERT_INTO_TABLE(xFILE_NAME IN VARCHAR2)

IS 


--UTL_FILE is an oracle package that allows you to read and write operating system files. 

TEXT_DATA UTL_FILE.FILE_TYPE;

v_ROW_LENGTH    NUMBER := 1024;
v_TEXTSTRING    VARCHAR2(4000);
cLINE           VARCHAR2(100);
xJOB_ID         NUMBER;
xCTRL_ID        NUMBER;
xACCT_SEC       VARCHAR2(1);
xCREATEDON_DATE DATE;
xCOUNT          NUMBER := 0;

BEGIN
    BEGIN
    --Streams in the file data and assigns it to TEXT_DATA variable. 
    TEXT_DATA := UTL_FILE.FOPEN('DIR', xFILE_NAME, 'R', v_ROW_LENGTH);      
END;

--Begin LOOP to get each line and assign to cLINE to extract, assign to each variable, and insert into the table
LOOP    
    BEGIN
        --Gets each string/line up to the line terminator
        UTL_FILE.GET_LINE(TEXT_DATA, v_TEXTSTRING);
    EXCEPTION
        WHEN NO_DATA_FOUND THEN
            EXIT;
    END;

    --Each line is assigned to the variable cLINE.
    cLINE := v_TEXTSTRING;

    --Begin to parse data using SUBSTRING and INSTRING functions
    BEGIN        
        --Extracts string from cLINE position 1 up to the first occurrence, converts it to a number, and assigns it to the variable. 
        xJOB_ID         := TO_NUMBER(SUBSTR(cLINE, 1,INSTR(cLINE, ',', 1, 1)-1));

        --Extracts string from cLINE between the 1st and 2nd occurrence, converts it to a number, and assigns it to the variable.
        xCTRL_ID        := TO_NUMBER(SUBSTR(cLINE, INSTR(cLINE, ',', 1, 1)+1, INSTR(cLINE, ',', 1,2)-INSTR(cLINE, ',', 1,1)-1));

        --Extracts string from cLINE between the 2nd and 3rd occurrence and assigns it to the variable.
        xACCT_SEC       := SUBSTR(cLINE, INSTR(cLINE, ',', 1, 2) +1, INSTR(cLINE, ',', 1,3)-INSTR(cLINE, ',', 1,2) -1);  

        --Extracts string from cLINE after the last occurrence, converts it to a date, and assigns it the variable.
        xCREATEDON_DATE := TO_DATE(SUBSTR(cLINE, INSTR(cLINE, ',', 1, 3)+1), 'MM/DD/YYYY');

        INSERT INTO TABLE(JOB_ID, CTRL_ID, ACCT_SEC, CREATEDON_DATE)
        VALUES(xJOB_ID, xCTRL_ID, xACCT_SEC, xCREATEDON_DATE);
        COMMIT;

        --Counter to count the amount of inserts
        xCOUNT := xCOUNT + 1;

    EXCEPTION
        --Exception to handle the conversion of a string to a NUMBER or value is longer than the declared length of the variable.
        WHEN VALUE_ERROR THEN 
        NULL;

    END;        
END LOOP;

DBMS_OUTPUT.PUT_LINE('RECORDS INSERTED: ' || xCOUNT);    
UTL_FILE.FCLOSE(TEXT_DATA);

END;
NewComer
  • 205
  • 6
  • 12
  • 2
    Is there a reason you're creating your own solution instead of using [SQL\*Loader](https://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_concepts.htm) or an [external table](https://docs.oracle.com/cd/E11882_01/server.112/e22490/et_concepts.htm) to load the file into the table? The file is already on the server and in a recognised directory so an external table seems like the simplest approach. – Alex Poole Nov 03 '15 at 15:43
  • @AlexPoole - To be honest, not aware of SQL*Loader and not sure how I would use it and I did think of an external table, but always seem to run into issues when I run the 'SELECT * FROM EXTERNAL_TABLE' against the .CSV file (Maybe I'll tackle this again soon and bang my head a bit more). Did I make my life difficult by doing the SP? Thanks! – NewComer Nov 03 '15 at 15:50
  • 1
    Excellent point, @AlexPoole! External tables FTW! \o/ (And the added benefit that the insert becomes `insert into table_name (...) select ... from external_table_name;` which is really easy to debug...) – Boneist Nov 03 '15 at 15:52
  • @NewComer you would do better to pursue the external table option and if you get stuck, ask for help *{:-) – Boneist Nov 03 '15 at 15:54
  • 1
    External tables don't usually require any head-banging... an answer based on an external table wouldn't be appropriate for what you asked though. I'd suggest you try that approach and if you struggle ask a separate question about that? – Alex Poole Nov 03 '15 at 15:54
  • @Boneist - I will begin to do an external table and be sure to add it to this post. I'll scrap what I have done and take your advice. Even though I didn't succeed, I sure did learn a lot. Thanks again! – NewComer Nov 03 '15 at 15:57
  • 1
    @NewComer - I think you should ask a new question if you need help with an external table; boneist has already provided an answer to what you asked originally, and you may still get others; and they may be helpful to future visitors. – Alex Poole Nov 03 '15 at 16:05

1 Answers1

2

You could use REGEXP_SUBSTR instead, since that just requires one function call, rather than a series of SUBSTR and INSTR calls, eg:

(courtesy of Gary_W):

declare
  v_str varchar2(20) := 'a,,bcd';
  v_substr1 varchar2(10);
  v_substr2 varchar2(10);
  v_substr3 varchar2(10);
begin
  v_substr1 := regexp_substr(v_str, '([^,]*)(,|$)', 1, 1, NULL, 1);
  v_substr2 := regexp_substr(v_str, '([^,]*)(,|$)', 1, 2, NULL, 1);
  v_substr3 := regexp_substr(v_str, '([^,]*)(,|$)', 1, 3, NULL, 1);
  dbms_output.put_line(v_substr1||':'||v_substr2||':'||v_substr3);
end;
/

a::bcd

The above will work with strings that have null portions, as demonstrated. The search pattern has been split into two groups (aka subexpressions): [^,]* and ,|$.

The first group says: any character that isn't a comma ([^,]) that appears 0 or more times (*).

The second group says: comma or the end of the line.

So the whole pattern is looking for a set of any characters excluding the comma that may or may not exist which is followed by either a comma or the end of the line.

The last parameter in the regexp_substr is indicating that we want to pick the 1st subexpression from the search pattern to display - if we didn't include this, then you'd end up with the commas being displayed as part of the string being returned.


If you're absolutely sure that none of the elements of the string will ever be null, then the following will work:

declare
  v_str varchar2(20) := 'a,123,bcd';
  v_substr1 varchar2(10);
  v_substr2 varchar2(10);
  v_substr3 varchar2(10);
begin
  v_substr1 := regexp_substr(v_str, '[^,]+', 1, 1);
  v_substr2 := regexp_substr(v_str, '[^,]+', 1, 2);
  v_substr3 := regexp_substr(v_str, '[^,]+', 1, 3);
  dbms_output.put_line(v_substr1||':'||v_substr2||':'||v_substr3);
end;
/

a:123:bcd

This is just looking for the specified occurrence of a string of characters that aren't a comma, which is slighly easier to grok (imho!) than the search pattern used in the previous example, but is a lot less robust.

Boneist
  • 22,910
  • 1
  • 25
  • 40
  • Warning! The regex of the form `[^,]+` which happens to be the most common one you'll see when searching for how to parse a list does not work if there is a NULL element in the list! It will silently return the wrong element resulting in bad data being returned. Try setting `v_str` to `'a,,bcd'` and see the results. Use this call instead (the 4th argument is the element you want): `regexp_substr(v_str, '(.*?)(,|$)', 1, 1, NULL, 1)`. See here for more info: https://stackoverflow.com/questions/25648653/regex-to-select-nth-value-from-a-list-allowing-for-nulls/25652018#25652018 – Gary_W Nov 03 '15 at 17:13
  • Thanks for the warning, @Gary_W. I'll have to look into this tomorrow! If I use the pattern you mention, I still get commas (eg. with v_str = 'a,,bcd' I get an output of a,:,:bcd from the above procedure); is there a nifty way of removing them as part of the regexp, rather than having to add in an rtrim()? – Boneist Nov 03 '15 at 17:55
  • I don't, maybe you have a typo? I get `'a:bcd:'` with the original code you posted that is. With my suggestion I get '`a::bcd'`. – Gary_W Nov 03 '15 at 19:15
  • Hmm! Will take a look tomorrow and see if I can figure out what's going on! – Boneist Nov 03 '15 at 19:16
  • @Gary_W Ok, I worked out what I had done wrong to get the extra commas (basically, had updated the search pattern, but hadn't added in the last two params, d'oh!). I've also learnt a bit more about regular expressions, so thanks very much for that! *{:-) (That subexpression thing is handy; no idea why I've never played with it before!) I'll updated my answer to include your recommendation. Cheers again. – Boneist Nov 04 '15 at 12:16
  • Glad to hear it! I've been on a one-man campaign to alert folks to the problem of using `'[^,]+'` with NULL list elements and shudder to think of all the wrong data being returned out there. Now there's two of us to spread the word! :-) Another reference: http://stackoverflow.com/questions/31464275/split-comma-separated-values-to-columns-in-oracle/31464699#31464699 – Gary_W Nov 04 '15 at 12:29
  • @Gary_W Thanks for that 2nd link; I think I find `([^,]*)(,|$)` easier to understand than `(.*?)(,|$)` - I don't understand why the `?` is necessary. And yes, the compaign is now a two-person campaign *{;-) – Boneist Nov 04 '15 at 12:36
  • Actually I found `(.*?)` after that first post and found it easier when it came to encapsulating the regex in a function called GET_LIST_ELEMENT for reuse as referred to in one of those links. – Gary_W Nov 04 '15 at 13:08