0

Example data:

1, test, "xy, yz", dog, cat
2, test2, xy, fish, bear

As you can see above, the second column may or may not be enclosed in quotes with a comma as part of the string.

I am simply trying to find a way to ignore the comma within the quotes, if either is there. I am using UTL_FILE and can't use external tables, I know I know, but I am not the one who gets to make decisions.

I haven't really "tried" anything yet, because I haven't been able to find anything to try in this particular scenario. I have code that extracts substrings between each comma, but obviously, will split "xy, yz" into 2 separate strings. I have been able to get it to work on ONE row, by using " as the delimiter for that specific column, but that goes to hell, when the next row has no quotes for the second column.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user2026989
  • 11
  • 1
  • 4
  • sorry what are you trying to do? Are you writing an oracle stored proc that reads csv? – gerrytan Jan 30 '13 at 21:28
  • correct. basically its for a one time data migration, i normally don't really work on the oracle side, and I am finding regex doesn't seem to work the same way in oracle. Essentially Strings that contain a comma appear in the csv as "xxx, xxx" when parsing the csv by comma, it obviously doesnt know the "xxx, xxx" is one string and parses it in half. – user2026989 Jan 30 '13 at 22:40
  • I have come up with a rather messy way of replacing the commas within quotes, (then replace them back AFTER the string has been split). i.e. looping through the line replacing comma by comma within the quotes, then parsing the line with the comma delimiter, so I have my separate strings. Then on the columns that have commas replace the comma replacement character back with a comma. This "works" but it is ugly, I am still on the lookout for something more efficient. – user2026989 Jan 30 '13 at 22:45
  • If you aren't allowed to use external tables for some reason (despite them probably being the ideal tool for this), are you also banned from using SQL*Loader? – Alex Poole Jan 31 '13 at 09:58
  • nope. I cant use SQL*Loader either. My hands are basically tied, and unfortunately the solution i did come up with, doesn't work in the event there are two fields in a line that have commas as part of a string. – user2026989 Jan 31 '13 at 19:21
  • OK, is it possible to get the input file delivered with all fields enclosed in double-quotes (i.e. `"1","test","xy, yz","dog","cat"`)? – Alex Poole Jan 31 '13 at 19:51

2 Answers2

3

If you can get the input file delivered with all fields enclosed in double-quotes (or can translate it after reading; my regex isn't good enough) you can use dbms_utility.comma_to_table; the fields need to be enclosed because (as noted here) the tokens have to be valid object names, so 1 causes an error. For example:

declare
    file utl_file.file_type;
    list varchar2(120);
    tablen binary_integer;
    tab dbms_utility.lname_array;
begin
    file := utl_file.fopen('MY_DIR', 'test1.csv', 'R');
    loop
        begin
            utl_file.get_line(file => file, buffer => list);
        exception
            when no_data_found then
                exit;
        end;
        dbms_output.put_line('Raw list: ' || list);

        dbms_utility.comma_to_table(list => list,
            tablen => tablen, tab => tab);

        for i in 1..tablen
        loop
            dbms_output.put_line('Column ' || i || ': '
                || replace(tab(i), '"'));
        end loop;
    end loop;
    utl_file.fclose(file);
end;
/

gives:

Raw list: "1","test","xy, yz","dog","cat"
Column 1: 1
Column 2: test
Column 3: xy, yz
Column 4: dog
Column 5: cat
Raw list: "2","test2","xy","fish","bear"
Column 1: 2
Column 2: test2
Column 3: xy
Column 4: fish
Column 5: bear
Raw list: "3","test3","ab, cd","rabbit, rabbit","duck"
Column 1: 3
Column 2: test3
Column 3: ab, cd
Column 4: rabbit, rabbit
Column 5: duck

If they aren't quoted then you can use a regex (pattern from here):

declare
    file utl_file.file_type;
    list varchar2(120);
    pattern varchar2(15) := '("[^"]*"|[^,]+)';
    c sys_refcursor;
    i number;
    f varchar2(20);
begin
    file := utl_file.fopen('MY_DIR', 'test2.csv', 'R');
    loop
        begin
            utl_file.get_line(file => file, buffer => list);
        exception
            when no_data_found then
                exit;
        end;

        dbms_output.put_line('Raw list: ' || list);

        open c for
             select level as col,
                 regexp_substr(list, pattern, 1, rownum) split  
             from dual
             connect by level <= length(regexp_replace(list, pattern))  + 1;

        loop
            fetch c into i, f;
            exit when c%notfound;
            dbms_output.put_line('Column ' || i || ': ' || replace(f, '"'));
        end loop;
        close c;

    end loop;
    utl_file.fclose(file);
end;
/

which gives:

Raw list: 1,test,"xy, yz",dog,cat
Column 1: 1
Column 2: test
Column 3: xy, yz
Column 4: dog
Column 5: cat
Raw list: 2,test2,xy,fish,bear
Column 1: 2
Column 2: test2
Column 3: xy
Column 4: fish
Column 5: bear
Raw list: 3,test3,"ab, cd","rabbit, rabbit",duck
Column 1: 3
Column 2: test3
Column 3: ab, cd
Column 4: rabbit, rabbit
Column 5: duck

I'm not sure if you really have spaces between the fields as shown in the question. If so, the first method still works and you can add a trim() around tab(i). The second method breaks down so would need a bit of adjustment...

Community
  • 1
  • 1
Alex Poole
  • 183,384
  • 11
  • 179
  • 318
0

Oracle SQL Developer tool came with a data import wizard that supports CSV. Have a look at § 5.34 of the manual. It is probably easier than processing your csv manually. If any processing need to be done you can do it via excel / perl.

gerrytan
  • 40,313
  • 9
  • 84
  • 99