i'm trying to determine the amount of line in a CSV-File which i'm about to import via external table. i need to know the total amount of lines to determine within my procedure to react accordingly.
question is, what is the fastest way to do that?
i'm trying to determine the amount of line in a CSV-File which i'm about to import via external table. i need to know the total amount of lines to determine within my procedure to react accordingly.
question is, what is the fastest way to do that?
The two obvious options are to just count the rows from the external table, or to read the file line by line and count those. With small files there isn't much difference, and the line-by-line approach can be slight faster; but with bigger files the native nature of the external table query makes it significantly faster.
But you may be able to use the file size as an approximation for the number of lines - it depends how accurate you need to be. You can do that directly, or via a BFILE, and from a BFILE you can load a CLOB more quickly that UTL_FILE reads the file - and can then count lines by looking for newline characters.
Test block:
declare
l_time pls_integer;
l_count pls_integer;
l_file utl_file.file_type;
l_line varchar2(32767);
l_bfile bfile;
l_clob clob;
l_dest_offset pls_integer := 1;
l_src_offset pls_integer := 1;
l_lang_context pls_integer := dbms_lob.default_lang_ctx;
l_warning pls_integer;
l_exists boolean;
l_file_length pls_integer;
l_block_size pls_integer;
begin
l_time := dbms_utility.get_cpu_time;
select count(*) into l_count from t42_ext;
dbms_output.put_line('external table: count ' || l_count
|| ' took ' || (dbms_utility.get_cpu_time - l_time));
l_time := dbms_utility.get_cpu_time;
l_count := 0;
l_file := utl_file.fopen('D42', 'root.dat', 'r', 32767);
loop
begin
utl_file.get_line(l_file, l_line, 32767);
exception
when no_data_found then
exit;
end;
l_count := l_count + 1;
end loop;
utl_file.fclose(l_file);
dbms_output.put_line('utl_file read loop: count ' || l_count
|| ' took ' || (dbms_utility.get_cpu_time - l_time));
l_time := dbms_utility.get_cpu_time;
utl_file.fgetattr('D42', 'root.dat', l_exists, l_file_length, l_block_size);
dbms_output.put_line('utl_file fgetattr: size ' || l_file_length
|| ' took ' || (dbms_utility.get_cpu_time - l_time));
l_time := dbms_utility.get_cpu_time;
l_bfile := bfilename('D42', 'root.dat');
dbms_output.put_line('bfile getlength: size ' || dbms_lob.getLength(l_bfile)
|| ' took ' || (dbms_utility.get_cpu_time - l_time));
l_time := dbms_utility.get_cpu_time;
dbms_lob.open(l_bfile, dbms_lob.lob_readonly);
dbms_lob.createtemporary(l_clob, false);
dbms_lob.loadclobfromfile(l_clob, l_bfile, dbms_lob.getLength(l_bfile),
l_dest_offset, l_src_offset, dbms_lob.default_csid, l_lang_context, l_warning );
dbms_lob.close(l_bfile);
dbms_output.put_line('clob getlength: size ' || dbms_lob.getLength(l_bfile)
|| ' took ' || (dbms_utility.get_cpu_time - l_time));
l_time := dbms_utility.get_cpu_time;
dbms_output.put_line('clob length diff: count '
|| (length(l_clob) - length(replace(l_clob, chr(10))))
|| ' took ' || (dbms_utility.get_cpu_time - l_time));
l_time := dbms_utility.get_cpu_time;
dbms_output.put_line('clob regexp_count: count '
|| regexp_count(l_clob, chr(10))
|| ' took ' || (dbms_utility.get_cpu_time - l_time));
dbms_lob.freetemporary(l_clob);
end;
/
Running against two sample files (11gR2 on Linux):
PL/SQL procedure successfully completed.
external table: count 1024 took 1
utl_file read loop: count 1024 took 1
utl_file fgetattr: size 189440 took 0
bfile getlength: size 189440 took 0
clob getlength: size 189440 took 2
clob length diff: count 1024 took 0
clob regexp_count: count 1024 took 22
PL/SQL procedure successfully completed.
external table: count 1048576 took 85
utl_file read loop: count 1048576 took 1403
utl_file fgetattr: size 193986560 took 0
bfile getlength: size 193986560 took 0
clob getlength: size 193986560 took 742
clob length diff: count 1048576 took 374
clob regexp_count: count 1048576 took 21808
There may be some effects from operating system and/or storage device file caches too. But based on this test, checking the file size is quickest, followed fairly closely by doing a native count of the rows in the external table. Counting the line breaks in a loaded CLOB is significantly slower, and a line-by-line count via UTL_FILE is significantly slower still (not surprisingly). Using a regular expression count of a CLOB is too slow to consider, even for smaller files really.
You should test on your environment and with realistic data though.
i solved it the following way
i implemented a shell script (rowcount.sh) that resides on the db server with the following content
/usr/bin/cat $1 | /usr/bin/wc -l
created a directory inside oracle
CREATE DIRECTORY exec_dir AS '/home/oracle/';
GRANT READ, EXECUTE ON DIRECTORY exec_dir TO stc;
implemented the following function to make use of external tables and the functionality of the preprocesser.
CREATE OR REPLACE PROCEDURE file_row_count(filename IN VARCHAR2, line_count OUT NUMBER)
IS
l_random_table_name VARCHAR2(30):= 'import_'||SUBSTR(SYS_GUID(),1,15);
l_count NUMBER;
BEGIN
EXECUTE IMMEDIATE 'CREATE TABLE '||l_random_table_name||'(
line_count NUMBER
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY EXT_CSV_DIR
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
PREPROCESSOR exec_dir:''rowcount.sh''
NOBADFILE
NODISCARDFILE
NOLOGFILE
FIELDS
)
LOCATION (EXT_CSV_DIR:'''||filename||''')
)
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING'
;
EXECUTE IMMEDIATE 'SELECT LINE_COUNT FROM '||l_random_table_name INTO line_count;
EXECUTE IMMEDIATE 'DROP TABLE '||l_random_table_name;
EXCEPTION
WHEN OTHERS THEN
dbms_output.put_line(to_char(SQLERRM));
dbms_output.put_line('try to drop temp table'||l_random_table_name);
SELECT COUNT(*) INTO l_count
FROM SYS.USER_TABLES
WHERE table_name = UPPER(l_random_table_name)
;
IF l_count > 0 THEN
EXECUTE IMMEDIATE 'DROP TABLE '||l_random_table_name;
END IF;
END file_row_count;
/
make use of the function for files you know you have to import
DECLARE
L_FILENAME VARCHAR2(200);
L_LINE_COUNT NUMBER;
BEGIN
L_FILENAME := 'data0.csv';
FILE_ROW_COUNT(
FILENAME => L_FILENAME,
LINE_COUNT => L_LINE_COUNT
);
DBMS_OUTPUT.PUT_LINE('LINE_COUNT = ' || L_LINE_COUNT);
--rollback; l
END;