0

I have a CSV/TSV file with data and want to load that CSV data into Database. I am using Java or Python and PostgreSQL to do that (I can't change that).

The problem is that for each row I make an INSERT query and it is not that efficient if I have let's say 600.000 rows. Is there any more efficient way to do it?

I was wondering if I can take more rows and create just one big query and execute it on my database but I'm not sure if that helps at all or should I divide the data in maybe let's say 100 pieces and execute 100 queries?

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • The answer is yes you can. I think you are searching for something what is named "batch insertion". https://stackoverflow.com/questions/758945/whats-the-fastest-way-to-do-a-bulk-insert-into-postgres – Ziemowit Stolarczyk Mar 04 '20 at 20:31
  • thank you, @ZiemowitStolarczyk :) –  Mar 04 '20 at 20:45

2 Answers2

2

If the CSV file is compatible with the format required by copy from stdin, then the most efficient way is to use the CopyManager API.

See this answer or this answer for example code.


If your input file isn't compatible with Postgres' copy command, you will need to write the INSERT yourself. But you can speed up the process by using JDBC batching:

Something along the lines:

PreparedStatement insert = connection.prepareStatement("insert into ...");
int batchSize = 1000;
int batchRow = 0;
// iterate over the lines from the file
while (...) {
   ... parse the line, extract the columns ...
   insert.setInt(1, ...);
   insert.setString(2, ...);
   insert.setXXX(...);
   insert.addBatch();
   batchRow ++;
   if (batchRow == batchSize) {
     insert.executeBatch();
     batchRow = 0);
   }
}
insert.executeBatch();

Using reWriteBatchedInserts=true in your JDBC URL will improve performance even more.

0

Assuming the server can access the file directly, you could try using the COPY FROM command. If your CSV is not of the right format it might still be faster to transcript it to something the COPY command will handle (e.g. while copying to a location that the server can access).

deroby
  • 5,902
  • 2
  • 19
  • 33
  • Never mind, seems Ziemowit Stolarczyk already linked to an answer with pretty much the same idea, but better explained =) – deroby Mar 09 '20 at 19:25