Hard time parsing a CSV file which has column data that seem to have been loaded randomly in source system:
My input is a CSV file that has 88 columns , i need to load this to a oracle table. SQL loader throws out bad rows (that's pretty much everything in the file) hence this is not an option. I decided to go with UTL_FILE in PL/SQL procedure which will give me the flexibility to parse each row and split data into columns.
A mock record in CSV would look like this, the 5th COl and 7th COL are difficult to deal with:
RECORD 1: PS-ASXGBDFEG,aOA2000x29,NULL,NULL,"899245, 892345",NULL,4-7-12: Pre-Shipment report NKGWT00065104 received for keycode 8269353 with AQL defect levels of 1/0 Critical, 17/10 Major and 24/14 Minor found.,23/11/2011 12:00:00 AM
RECORD 2: PS-ASXGANBAG,aOA2000x39,NULL,NULL,898245",NULL,22-11--11: Pre shipment inspection report Intertek SHIJ25252253 for keycode 8241233. AQL levels: 0/0 critical, 16/10 major and 4/14 minor defects. The pre-shipment report has failed due to poor workmanship and studs out of alignment. Manufacturing quality faults are a concern; this must improve with future production. Dimensions out of tolerance are commercially acceptable for this shipment. All faulty garments must be removed from bulk or repaired. Government documents for approval of the Australian flag on print and for ?Permission to Import?Judy,23/11/2011 12:00:00 AM
RECORD 3: PS-ASXGNDBAG,aOA2000x42,NULL,NULL,"7771965(Black), 7771958(White)",NULL,NULL,NULL
RECORD 4: PS-ASXGLPBAG,aOA2000x80,NULL,NULL,8272700/E,NULL,NULL,NULL
I used the basic parsing (which apparently fails miserably )
F := UTL_FILE.FOPEN ('FILE_DIR', 'QAProductSpecs.csv', 'R',32767);
IF UTL_FILE.IS_OPEN(F) THEN
LOOP
UTL_FILE.GET_LINE(F, V_LINE, 10000);
V_LENGTH := (INSTR(V_LINE, ',', 1, 1));
PRODUCT_SPEC_ID := SUBSTR(V_LINE,1,V_LENGTH-1);
--CHOP OUT PRODUCT_SPEC_ID column from the CURRENT ROW
V_LINE := SUBSTR(V_LINE,V_LENGTH+1);
V_LENGTH := (INSTR(V_LINE, ',', 1, 1));
PRODUCT_SPEC_ID2 := SUBSTR(V_LINE, 1, V_LENGTH-1);
--CHOP OUT PRODUCT_SPEC_ID2 column from the CURRENT ROW
V_LINE := SUBSTR(V_LINE,V_LENGTH+1);
.. So on for other columns.. and finally load the record once all the 88 columns value for a row are in the respective variables..
end loop
end if
I also tried DBMS_UTILITY.COMMA_TO_TABLE from the link below , but without a proper pattern this too did not help much [Ignore commas between quotations when reading CSV Oracle stored procedure UTL_FILE
I am unable to find a pattern to extract column as col5 and col7(free text or NULL) have quite a random data pattern in them..
Need help to parse this one.. or any alternative way to load this CSV to ORACLE table that database version is 10.2 or can it not be done in PL/SQL at all :(