Vertica COPY rejects all data when source is gzip

Question

I'm trying to upload a file to vertica using the vsql CLI. My command looks like this (it's all one line, but I'm breaking it up with backslashes here for ease of reading):

vsql "-c set timezone to 'UTC';     \   
    copy etl.test_data1 (col1,col2,col3)\
    from local 'test_data1.csv.gz' \
    gzip delimiter E'\\t'       \
    exceptions '/home/error/etl.test_data1.err' \
    NULL AS 'NULL'        \
    NO ESCAPE    \
    rejected data '/home/rejected/etl.test_data1.rejected'" \
    -p5433 -hverticahostname -Uusername -ddatabase -vON_ERROR_STOP=ON

When I run that command, it executes with no error, but no rows are loaded and the data goes straight into the rejected file.

When I run this csv-load command on the un-gzipped version of the same file, it loads the data into the table without rejections:

vsql "-c set timezone to 'UTC';     \   
    copy etl.test_data1 (col1,col2,col3)\
    from local 'test_data1.csv' \
    delimiter ','       \
    exceptions '/home/error/etl.test_data1.err' \
    NULL AS 'NULL'        \
    NO ESCAPE    \
    rejected data '/home/rejected/etl.test_data1.rejected'" \
    -p5433 -hverticahostname -Uusername -ddatabase -vON_ERROR_STOP=ON

I tried running the command with no explicit delimiter set, same result. I tried making the gzip with both the UNIX gzip command and by encoding it using the python gzip module and got the same result (just in case the file was corrupted somehow).

The file I would eventually like to upload via job is massive and csv isn't a performant option for a regular upload. Any suggestions as to why the gzip load is failing?

EDIT: Solution in comments; doing gzip on a csv doesn't change the delimiter -- still need to use ',' as the delimiter, not tab!

Try putting the commands in a file to pass via [-f](https://my.vertica.com/docs/7.1.x/HTML/Content/Authoring/ConnectingToHPVertica/vsql/CommandLineOptions/fFilename--fileFilename.htm) ? That will avoid some delimiter escaping issues and offer more piping options. Another option is to gzip the file just for the upload, and gunzip it before importing via COPY. — Dan Dascalescu, Apr 19 '16 at 16:35
The second (working) version cannot be on the gunzipped version of the same file because you did use a different delimiter and gunzip won't change the delimiter of your CSV... Maybe you had to use the same delimiter (comma) instead of tab (?) — mauro, Apr 19 '16 at 17:44
@mauro YES you are correct it was the delimiter issue all along. — quaintm, Apr 19 '16 at 19:17

Vertica COPY rejects all data when source is gzip

0 Answers0