3

I have data in CSV file I am trying to insert into a postgresSQL table using pgloader. The input file is from MS SQL server export, and NULL values are already explicitly cast as NULL.

My pgloader scripts seems to fail for the keywords NULL, noticeably for integer and timestamp fields.

I really don't know what I am missing. Your help will be much appreciated.

I can successfully insert into the table from psql console:

insert into raw.a2
(NUM , F_FILENO , F_FOLIONO , F_DOC_TYPE , F_DOCDATE , F_BATCH , F_BOX , F_BLUCPY , F_ROUTOPOST , F_ROUTOUSR , F_WFCREATE , LINKEDFILE , DATECREATE , USERCREATE , DATEUPDATE , USERUPDATE , MEDIA , PGCOUNT , GROUPNUM , SUBJECT , PRI , F_FILECAT)
values
(
16,'18',3,'Nomination Details',NULL,NULL,NULL,1,NULL,NULL,1,'00000016.TIF','2011-02-08 13:02:11.000','isaac','2012-01-12 08:52:31.000','henrey','Multi',4,1.0,0,'-',NULL
);

INSERT 0 1

file-sample

1,'6',1,'Details',2011-02-22 00:00:00.000,NULL,NULL,1,NULL,NULL,2,'00000001.TIF',2011-02-08 09:42:24.000,'kevin',2011-10-27 09:08:42.000,'james','Multi',1,1.0,0,'-',NULL
2,'6',2,'Bio data',NULL,NULL,NULL,1,NULL,NULL,2,'00000002.TIF',2011-02-08 10:25:11.000,'kevin',2012-11-19 16:20:49.000,'pattie','Multi',4,1.0,0,'-',NULL
4,'10',1,'Details',2011-02-22 00:00:00.000,NULL,NULL,1,NULL,NULL,2,'00000004.TIF',2011-02-08 10:43:38.000,'kevin',2014-07-18 10:46:06.000,'brian','Multi',1,1.0,0,'-',NULL

pgloader commands

pgloader --type csv --with truncate --with "fields optionally enclosed by '''" --with "fields terminated by ','" --set "search_path to 'raw'" - "postgresql://postgres:postgres@localhost/doc_db?a2" < null_test

Table

                               Table "raw.a2"
   Column    |            Type             | Collation | Nullable | Default 
-------------+-----------------------------+-----------+----------+---------
 num         | integer                     |           | not null | 
 f_fileno    | character varying(15)       |           |          | 
 f_foliono   | integer                     |           |          | 
 f_doc_type  | character varying(50)       |           |          | 
 f_docdate   | timestamp without time zone |           |          | 
 f_batch     | integer                     |           |          | 
 f_box       | integer                     |           |          | 
 f_blucpy    | integer                     |           |          | 
 f_routopost | integer                     |           |          | 
 f_routousr  | character varying(49)       |           |          | 
 f_wfcreate  | integer                     |           |          | 
 linkedfile  | character varying(255)      |           |          | 
 datecreate  | timestamp without time zone |           |          | 
 usercreate  | character varying(50)       |           |          | 
 dateupdate  | timestamp without time zone |           |          | 
 userupdate  | character varying(50)       |           |          | 
 media       | character varying(5)        |           |          | 
 pgcount     | smallint                    |           |          | 
 groupnum    | double precision            |           |          | 
 subject     | smallint                    |           |          | 
 pri         | character varying(1)        |           |          | 
 f_filecat   | character varying(50)       |           |          | 

Indexes:

"a2_pkey" PRIMARY KEY, btree (num)

Output/Error

2019-07-24T05:55:24.231000Z WARNING Target table "\"raw\".\"a2\"" has 1 indexes defined against it.
2019-07-24T05:55:24.237000Z WARNING That could impact loading performance badly.
2019-07-24T05:55:24.237000Z WARNING Consider the option 'drop indexes'.
2019-07-24T05:55:24.460000Z ERROR PostgreSQL ["\"raw\".\"a2\""] Database error 22P02: invalid input syntax for integer: "NULL"
CONTEXT: COPY a2, line 1, column f_batch: "NULL"
2019-07-24T05:55:24.461000Z ERROR PostgreSQL ["\"raw\".\"a2\""] Database error 22007: invalid input syntax for type timestamp: "NULL"
CONTEXT: COPY a2, line 1, column f_docdate: "NULL"
Amaresh S M
  • 2,936
  • 2
  • 13
  • 25
Grooth
  • 65
  • 1
  • 6
  • Try to replace the word NULL in the file with empty string (without the quotes), for example: `1,'6',1,'Details',2011-02-22 00:00:00.000,,,1,,,2,'00000001.TIF',...` – Nir Alfasi Jul 24 '19 at 06:33
  • 1
    This has worked. Thank you very much. I will now just need to update my SQL server export script to spew out "" in place of the text NULL for NULL values. – Grooth Jul 24 '19 at 07:41

1 Answers1

1

From the docs of pgloader:

null if

This option takes an argument which is either the keyword blanks or a double-quoted string.

When blanks is used and the field value that is read contains only space characters, then it’s automatically converted to an SQL NULL value.

When a double-quoted string is used and that string is read as the field value, then the field value is automatically converted to an SQL NULL value.

Looks like you're missing --with 'null if "NULL"' in your command.

Otherwise, you should be able to load the CSV data directly from psql:

\copy raw.a2 (NUM, F_FILENO, F_FOLIONO, F_DOC_TYPE, F_DOCDATE, F_BATCH, F_BOX, F_BLUCPY, F_ROUTOPOST, F_ROUTOUSR, F_WFCREATE, LINKEDFILE, DATECREATE, USERCREATE, DATEUPDATE, USERUPDATE, MEDIA, PGCOUNT, GROUPNUM, SUBJECT, PRI, F_FILECAT) FROM 'file.csv' WITH (FORMAT csv, NULL 'NULL')
fphilipe
  • 9,739
  • 1
  • 40
  • 52
  • Thanks a lot. I tried that, but it either seems to a directive pgloader doesn't understand, or I have to provide more parameters with that directive. I may just have to replace the text "NULL" in my input string with "" for now. – Grooth Jul 24 '19 at 07:45
  • You _might_ what to try just leaving the null fields out like ( from your 1st example line) "1,'6',1,'Details',2011-02-22 00:00:00.000,,,1,,,2,'00000001.TIF',..." Changing to "" may cause it to generate a 0 length string. NULL and 0-length string are not the same in Postgres. – Belayer Jul 24 '19 at 22:36