1

There is the following business logic.

User uploads a file (xls, csv, google sheet). Files can be about 80K lines.

From the data in each row, records are created in the database (PostgreSQL). Each record is indexed in ElasticSearch.

It takes quite a long time. Therefore, I make data processing, writing to the database in celery.

How else can speed up data processing, creating records in the database? Multiprocesses? Threads? Celery chunks?

unknown
  • 252
  • 3
  • 12
  • 37
  • There are a lot of best practices for performance tuning for this use case, For instance - You can try inserting to postgres in batches of 1k-5k, And do the same for the elasticsearch, Also - you can consider indexing in elasticsearch in a separated thread so it won't be synchronous process – planben Aug 19 '20 at 09:11
  • @planben I thought about this way. Could you give an example? – unknown Aug 19 '20 at 09:20
  • If anything, I am using Flask and SqlAlchemy – unknown Aug 19 '20 at 09:24
  • when looking for an example I encountered the following post : https://stackoverflow.com/questions/758945/whats-the-fastest-way-to-do-a-bulk-insert-into-postgres which lead me to the following postgres copy command which can take a csv file and load it to a table - https://www.postgresql.org/docs/current/sql-copy.html – planben Aug 19 '20 at 09:28
  • @planben It's not all that simple. On the data that I receive from files, I need to perform certain operations in a loop (~80K iterations!!!) and only then write them to the database. Therefore, it will not be so easy to insert data from the table into the database. – unknown Aug 19 '20 at 09:33
  • 1
    I still think that copy will do the job if you'll write the post-manipulated rows into a file and then load it using COPY to postgres, But there are alternatives, you can also use bulk insert - https://www.postgresql.org/docs/8.4/dml-insert.html - one insert command to insert multiple rows, you can try to fine tune that number to see which one works the best for you, since you file is 80k rows - I would start with 4k – planben Aug 19 '20 at 09:37

0 Answers0