0

I faced with a situation where one the columns data to be uploaded exceeded 255 and it threw Error Field in data file exceeds maximum length error.

I found a way to fix it. Please find link here explaining the logic.

The question is that in a control file, is there any difference between

Comments                   CHAR(255) "TRIM(:Comments)" ,

and

Comments                   CHAR "TRIM(:Comments)" ,

when it comes to the internal workings of sqlldr or it means the same right ?

Also while uploading, because the integrity of the data file cannot be trusted, we create a table with all columns as 255 [which i will fix for columns > 255 length] and I never specify the CHAR length in the control file.


I would like to know if the difference between

using default 255 for all columns

vs

keeping little extra than expected length [actual target table column length values] eg. actual expected length [which might range from 10 to 150 etc.] + 50/100

is very significant if i use sqlldr atleast 2000 times a day on small data files with average of 250 records.

Thanks in advance for the clarification.

pOrinG
  • 896
  • 3
  • 13
  • 27

1 Answers1

0

I think they are the same as internally that is what the buffer size is as well. After dealing with this enough times I created a real rough utility script that generates a skeleton control file that uses the column's actual size. It gets me 90% of the way there and no column sizing issues. It may be better on memory as its not going to use the full 255 chars available if the column is smaller. Give it a try if you want. It may give you some ideas anyway.

/********************************************************************************************************
    Name:       GEN_CTL_FILE

    Desc:       Generates a skeleton control file for loading data via SQL*Loader.

    Args:       tablename_in IN VARCHAR2, delim_in IN VARCHAR2 DEFAULT '|'

    Returns:    None.

    Usage:      utilities.gen_ctl_file('tablename');

    Notes:      Prints a skeleton control file.

                If a template for a fixed-length data file is desired, use 'FIXED' for the delim_in string.
                FIXED needs more work to put actual lengths in.  For now just placeholders.  

                Example usage:

                set serveroutput on;
                execute utilities.gen_ctl_file('test', utilities.TAB);

   REVISIONS:
   Ver        Date        Author           Description
   ---------  ----------  ---------------  ------------------------------------
   1.1         6/6/2013    gary_w          - Created procedure.
   1.2         10/8/2013   gary_w          - Fixed decode statement. 
                                           - Added option to generate a fixed-length template.
   ************************************************************************************************************************/
  PROCEDURE GEN_CTL_FILE(tablename_in IN VARCHAR2, delim_in VARCHAR2 DEFAULT '|') IS
    ERRNULLTABLENAME     CONSTANT NUMBER        := -20103; -- User-defined error numbers and messages.
    ERRNULLTABLENAMEMSG  CONSTANT VARCHAR2(100) := 'A table name is required.';
    USAGE                CONSTANT VARCHAR2(100) := '*   USAGE: UTILITIES.GEN_CTL_FILE(tablename_in IN VARCHAR2, fieldsep_in VARCHAR2 DEFAULT ''|'')';
    v_delim                       VARCHAR2(20)  := NVL(delim_in, '|');

    CURSOR COL_CUR  IS
      SELECT COLUMN_NAME, 
      DECODE(COLUMN_ID, 1, ' ', ',') || RPAD(COLUMN_NAME, 32) || case upper(v_delim)
        when 'FIXED' then 'POSITION(99:99) '
        else NULL
      end|| DECODE(DATA_TYPE,
             'VARCHAR2', 'CHAR('||DATA_LENGTH||') NULLIF(' || COLUMN_NAME || '=BLANKS)',
             'FLOAT', 'DECIMAL EXTERNAL NULLIF(' || COLUMN_NAME || '=BLANKS)',
             'NUMBER', DECODE(                                                 DATA_PRECISION,
             0, 'INTEGER EXTERNAL NULLIF (' || COLUMN_NAME || '=BLANKS)',
             DECODE(DATA_SCALE, 0, 'INTEGER EXTERNAL NULLIF (' || COLUMN_NAME || '=BLANKS)', 'DECIMAL EXTERNAL NULLIF (' || COLUMN_NAME || '=BLANKS)')),
             'DATE', 'DATE "MM/DD/YYYY" NULLIF (' || COLUMN_NAME || '=BLANKS)',
             data_type)
               AS COL_DATA
      FROM  USER_TAB_COLUMNS
      WHERE TABLE_NAME = UPPER(tablename_in)
      ORDER BY COLUMN_ID;

  BEGIN

    IF tablename_in IS NULL THEN
      RAISE_APPLICATION_ERROR(ERRNULLTABLENAME, ERRNULLTABLENAMEMSG || CR || USAGE);
    END IF;

    DBMS_OUTPUT.PUT_LINE('--');
    DBMS_OUTPUT.PUT_LINE('-- NOTE - When using DIRECT=TRUE to perform block inserts to a table,');
    DBMS_OUTPUT.PUT_LINE('--        the table''s triggers will not be used! Plan accordingly to');
    DBMS_OUTPUT.PUT_LINE('--        manually perform the trigger actions after loading, if needed.');
    DBMS_OUTPUT.PUT_LINE('--');
    DBMS_OUTPUT.PUT_LINE('OPTIONS (DIRECT=TRUE)');
    DBMS_OUTPUT.PUT_LINE('UNRECOVERABLE');
    DBMS_OUTPUT.PUT_LINE('LOAD DATA');
    DBMS_OUTPUT.PUT_LINE('APPEND');
    DBMS_OUTPUT.PUT_LINE('INTO TABLE ' || UPPER(tablename_in));
    DBMS_OUTPUT.PUT_LINE('EVALUATE CHECK_CONSTRAINTS');
    if upper(v_delim) != 'FIXED' then
      DBMS_OUTPUT.PUT_LINE('FIELDS TERMINATED BY ' || '''' || v_delim || '''');
      DBMS_OUTPUT.PUT_LINE('OPTIONALLY ENCLOSED BY ''"'' ');
      DBMS_OUTPUT.PUT_LINE('TRAILING NULLCOLS');
    end if;
    DBMS_OUTPUT.PUT_LINE('(');

    -- The cursor for loop construct implicitly opens and closes the cursor.
    FOR COL IN COL_CUR
    LOOP
      IF COL.COLUMN_NAME != 'LOAD_DATE' THEN
        IF COL.COLUMN_NAME = 'LOAD_SEQ_ID' THEN
          dbms_output.put_line(','||RPAD('LOAD_SEQ_ID', 32)||'CONSTANT 0');
        ELSE
          DBMS_OUTPUT.PUT_LINE(COL.COL_DATA);
        END IF;
      END IF;
    END LOOP;
    DBMS_OUTPUT.PUT_LINE(')' || CHR(10));

  EXCEPTION
    WHEN OTHERS THEN
      RASIE;
  END; -- GEN_CTL_FILE

The output looks like this:

--
-- NOTE - When using DIRECT=TRUE to perform block inserts to a table,
--        the table's triggers will not be used! Plan accordingly to
--        manually perform the trigger actions after loading, if needed.
--
OPTIONS (DIRECT=TRUE)
UNRECOVERABLE
LOAD DATA
APPEND
INTO TABLE TEST
EVALUATE CHECK_CONSTRAINTS
FIELDS TERMINATED BY '|'
OPTIONALLY ENCLOSED BY '"'
TRAILING NULLCOLS
(
 COLA                            CHAR(200) NULLIF(COLA=BLANKS)
,COLB                            CHAR(100) NULLIF(COLB=BLANKS)
,COLC                            CHAR(500) NULLIF(COLC=BLANKS)
,COLD                            DECIMAL EXTERNAL NULLIF (COLD=BLANKS)
,COLE                            CLOB
)

If you tweak it, please share your changes.

Gary_W
  • 9,933
  • 1
  • 22
  • 40
  • Thanks for your answer. The issue is not about writing the script. Initially when i decided to create this utility which dynamically creates the control file and loads the data I had used the actual column sizes. But the data file provided was very untrustworthy so we decided to give it a default of 255 while uploading and instead of directly uploading to target we upload to a temporary table which again has same structure as of the target table but all columns are 255 varchar. – pOrinG Jul 28 '17 at 04:55
  • But lately I have been having second thoughts about 255 default and want to keep the actual size of the column + 100 [extra buffer] for untrustworthy data.`PS. don't worry about the date,number columns.` I handle it differently. – pOrinG Jul 28 '17 at 04:57
  • Oh, I see now. If you are loading into a staging table first, then validating from there and inserting into the real table, why not make the VARCHAR columns huge then, like 1000? They only use they space they need. I have a similar issue with some comment columns where the screen field allows entry of carriage returns, HTML characters, etc. I have a function I call during the load that strips out invalid characters. `COLA CHAR(200) "UTILITIES.CLEAN_COMMENTS(:COLA)"`. Maybe you can do that. – Gary_W Jul 28 '17 at 12:11
  • Your point about using huge buffers [like 1000] is correct, however this utility executes around 2000+ times a day for average of 250 records each. The reason I started having second thoughts about 255 default buffer is to reduce the load, buffer memory used etc. Anyway I changed my program logic to use the actual column lengths + 100 while creating the temporary table and control file. It should not cause any issues now. – pOrinG Jul 28 '17 at 14:20
  • VARCHAR2 only uses what it needs so it should not impact memory to make the values large. – Gary_W Jul 28 '17 at 14:33
  • Check this [document] (https://docs.oracle.com/cd/A97630_01/server.920/a96652/ch05.htm). :"Minimizing Memory Requirements for Bind Arrays" part and advise. – pOrinG Jul 28 '17 at 15:00
  • Interesting. The control files does use CHAR and not VARCHAR2 so maybe you are right. I'll have to look into this! This document is for Oracle 9, I wonder if memory use has been improved in 11g. – Gary_W Jul 28 '17 at 15:07
  • Try this question: [https://stackoverflow.com/questions/10386020/sql-loader-error-variable-length-field-exceeds-maximum-length] on why use char instead of varchar. I didn't get time to go through the reference in one of the answers, however I will soon. – pOrinG Jul 28 '17 at 15:09