Replace data of one column with substring of another column in sql loader

Question

I am loading data from a csv file into a table using sqlldr. There is one column which is not present in every row of the csv file. The data needed to populate this column is present in one of the other columns of the row. I need to split (split(.) )that column's data and populate into that column.

Like:-

 column1:- abc.xyz.n

So the unknown column(column2) should be

 column2:- xyz

Also, there is another column which is present in the row but it's not what I want to input into the table. It is also needed to be populated from column1. But there are around 50 if-else cases in that. Is decode preferable to do this?

column1:- abc.xyz.n

Then,

column2:- hi if(column1 has 'abc')
             if(column1 has 'abd' then 'hello')

like this there are around 50 if-else cases.

Thanks for help.

Gary_W · Accepted Answer · 2016-03-02T17:15:46.780

2

For the first part of your question, define the column1 data in the control file as BOUNDFILLER with a name that does not match a table column name which tells sqlldr to remember it but don't use it. If you need to load it into a column, use the column name plus the remembered name. For column2, use the remembered BOUNDFILLER name in an expression where it returns the part you need (in this case the 2nd field, allowing for NULLs):

  x        boundfiller,
  column1  EXPRESSION  ":x",
  column2  EXPRESSION  "REGEXP_SUBSTR(:x, '(.*?)(\\.|$)', 1, 2, NULL, 1)"

Note the double backslash is needed else it gets removed as it gets passed to the regex engine from sqlldr and the regex pattern is altered incorrectly. A quirk I guess.

Anyway after this column1 ends up with "abc.xyz.n" and column2 gets "xyz".

For the second part of your question, you could use an expression as already shown but call a custom function you create where you pass the extracted value and it would return the searched value from a lookup table. You certainly don't want to hardcode your 50 lookup values. You could do the same thing basically in a table level trigger too. Note I show a select statement for an example only but this should be encapsulated in a function for reusability and maintainability:

Just to show you can do it:

 col2  EXPRESSION  "(select 'hello' from dual where REGEXP_SUBSTR(:x, '(.*?)(\\.|$)', 1, 2, NULL, 1) = 'xyz')"

The right way:

 col2  EXPRESSION  "(myschema.mylookupfunc(REGEXP_SUBSTR(:x, '(.*?)(\\.|$)', 1, 2, NULL, 1)))"

mylookupfunc returns the result of looking up 'xyz' in the lookup table, i.e. 'hello' as per your example.

edited Mar 02 '16 at 17:15

answered Mar 02 '16 at 17:10

Gary_W

9,933
1
22
40

I will try to implement and get back. I was thinking about using case statements for second part and using SUBSTR(:column1,INSTR(:column1,'.'),INSTR(:column1,'.',-1)) for first part. Will these work too ? – Jaydeep Mar 03 '16 at 05:23
I have tried your method, works great and using a function for second part has made the job faster. Thanks – Jaydeep Mar 03 '16 at 12:31
@jaydeep glad it worked for you. Regarding your first comment, both could work but would be confusing and more maintenance in the long run should changes be needed. This way a non-coder could update lookup values in the table should they change for instance. – Gary_W Mar 03 '16 at 13:35
if one of my column is a#b#c#d can you help with expression to split it into four columns as a, b, c and d as done above – Jaydeep Mar 11 '16 at 08:40
@Jaydeep It would be basically the same as above, just change the delimiter from a period to the pound sign, and the 4th argument to REGEXP_SUBSTR() would change for each field you want. Call it for each field. – Gary_W Mar 11 '16 at 14:00
Hi gary, I tried what you said. (null) is coming time and again, can you please test and confirm if REGEXP_SUBSTR(:z, '(#*?)(\\#|$)', 1, 1, NULL, 1);REGEXP_SUBSTR(:z, '(#*?)(\\#|$)', 1, 2, NULL, 1);REGEXP_SUBSTR(:z, '(#*?)(\\#|$)', 1, 3, NULL, 1);REGEXP_SUBSTR(:z, '(#*?)(\\#|$)', 1, 4, NULL, 1) is the right way to do it – Jaydeep Mar 14 '16 at 07:46
REGEXP_SUBSTR('a#b#c#d', '[^#]+', 1, 1),REGEXP_SUBSTR('a#b#c#d', '[^#]+', 1, 2),REGEXP_SUBSTR('a#b#c#d', '[^#]+', 1, 3),REGEXP_SUBSTR('a#b#c#d', '[^#]+', 1, 4) has worked. Anyway thanks – Jaydeep Mar 14 '16 at 09:13
@jaydeep this should be a new question but the last form you mention will not work when a NULL is present in the list (try this for data and see`'a##c#d'`). The right way you almost had first but with a cpl of syntax errors:`(REGEXP_SUBSTR(:z, '(.*?)(#|$)', 1, 1, NULL, 1)`. The regex `'[^#]+'` you often see for parsing a string DOES NOT WORK when a NULL list item is present and should be avoided: http://stackoverflow.com/a/31464699/2543416 – Gary_W Mar 14 '16 at 14:10
Thanks for the insight Gary. – Jaydeep Mar 15 '16 at 05:24

Replace data of one column with substring of another column in sql loader

1 Answers1