1

I want to import my csv file to postgresql table. I have issue when csv don't have required number of columns. Below is my csv data, I don't have header in csv. I need to import 200+ csv files each have 500 000 records so almost 100 milion data:

1347121698001,1364113253001,en
1346453482000,1354904936000
1347121698001,1354904936000,ru

Schema:

create table TRIPLES(creation_timestamp  bigint
    , deletion_timestamp bigint
    , language_code  text);

CSV Import command:

copy public.TRIPLES(creation_timestamp, deletion_timestamp, language_code) 
FROM E:/deletions.csv' DELIMITER ',' CSV;

Error i am getting:

ERROR:  missing data for column "language_code"
CONTEXT:  COPY TRIPLES, line 4337: 

How i can skip that (or) that cell as null if possible?

James Z
  • 12,209
  • 10
  • 24
  • 44
swaroop pallapothu
  • 588
  • 1
  • 6
  • 15
  • the only "easy" option I see here is copying it to a table with one column and using exotic delimiter (so it's not in csv file) and later parse it to original table – Vao Tsun Jan 25 '18 at 11:12
  • i have almost 200 CSV files each have 500,000 records so almost 10 crores data. It ma not work for me @VaoTsun – swaroop pallapothu Jan 25 '18 at 11:23
  • you have to script it all (to avoid repeating same commands) you can't use postgres COPY with bad csv... – Vao Tsun Jan 25 '18 at 11:25
  • Do you have any example? – swaroop pallapothu Jan 25 '18 at 11:27
  • 1
    That's not a CSV file. CSV files are supposed to have the same number of columns in every row. That's what the error says - the column is missing – Panagiotis Kanavos Jan 25 '18 at 11:27
  • @PanagiotisKanavos, It is the CSV file given by google. You can find this in this link [csv source](https://developers.google.com/freebase/). I am successfully imported to SQL Server. – swaroop pallapothu Jan 25 '18 at 11:31
  • 1
    Doesn't matter who created it or what the extension is. COPY expects the same number of columns in every row. So does any other bulk import - they are supposed to work *fast*, not transform the input data. In this case, the data needs cleaning before COPY is used. You could use a small script that splits each row and output the same number of rows, appending `,` for every missing row – Panagiotis Kanavos Jan 25 '18 at 11:34
  • The reason bulk import tools can't deal with missing columns is that they can't know *which* column is missing. Could be the last, could be the second. If you are *sure* that only the last one is missing, you can add a single `,` to each row that's missing a column. – Panagiotis Kanavos Jan 25 '18 at 11:42
  • Text files almost always have issues. You'll have to create a process that separates good files from faulty ones and their data. Eg, create a loop that runs COPY for every file in a DROP folder, putting the data in a staging table. If COPY succeeds, move the file to an ARCHIVE folder and copy the data from staging to its proper target. Otherwise move the file to a different folder for inspection and (possibly) discard the data. You'll probably need to keep track of which row came from which file – Panagiotis Kanavos Jan 25 '18 at 11:45

0 Answers0