0

I am trying to create a web application, primary objective is to insert request data into database.

Here is my problem, One request itself contains 10,000 to 1,00,000 data sets of information (Each data set needs to be inserted separately as a row in the database)

I may get multiple request on this application concurrently, so its necessary for me to make the inserts fast.

I am using MySQL database, Which approach is better for me, LOAD DATA or BATCH INSERT or is there a better way than these two?

How will your application retrieve this information? - There will be another background thread based java application that will select records from this table process them one by one and delete them.

Can you queue your requests (batches) so your system will handle them one batch at a time? - For now we are thinking of inserting it to database straightaway, but yes if this approach is not feasible enough we may think of queuing the data.

Do retrievals of information need to be concurrent with insertion of new data? - Yes, we are keeping it concurrent.

Here are certain answers to your questions, Ollie Jones

Thankyou!

Avinash Nair
  • 1,984
  • 2
  • 13
  • 17
  • What are your performance requirements? ("Fast" isn't a usable performance specification when you're handling batches of a megarow each.) Can you queue your requests (batches) so your system will handle them one batch at a time? How will your application retrieve this information? Do retrievals of information need to be concurrent with insertion of new data? What is your table definition? – O. Jones Jan 19 '13 at 13:25
  • @Ollie Jones, I have edited my question. Please check thanks! – Avinash Nair Jan 20 '13 at 07:35
  • possible duplicate of [MySQL Query, bulk insertion](http://stackoverflow.com/questions/672386/mysql-query-bulk-insertion) and [What is the best way to achieve speedy inserts of large amounts of data in MySQL](http://stackoverflow.com/q/314593/62576) – Ken White Jan 20 '13 at 07:56

1 Answers1

1

Ken White's comment mentioned a couple of useful SO questions and answers for handling bulk insertion. For the record volume you are handling, you will enjoy the best success by using MyISAM tables and LOAD DATA INFILE data loading, from source files in the same file system that's used by your MySQL server.

What you're doing here is a kind of queuing operation. You receive these batches (you call them "requests") of records (you call them "data sets.) You put them into a big bucket (your MySQL table). Then you take them out of the bucket one at a time.

You haven't described your problem completely, so it's possible my advice is wrong.

Is each record ("data set") independent of all the others?

Does the order in which the records are processed matter? Or would you obtain the same results if you processed them in a random order? In other words, do you have to maintain an order on the individual records?

What happens if you receive two million-row batches ("requests") at approximately the same time? Assuming you can load ten thousand records a second (that's fast!) into your MySQL table, this means it will take 200 seconds to load both batches completely. Will you try to load one batch completely before beginning to load the second?

Is it OK to start processing and deleting the rows in these batches before the batches are completely loaded?

Is it OK for a record to sit in your system for 200 or more seconds before it is processed? How long can a record sit? (this is called "latency").

Given the volume of data you're mentioning here, if you're going into production with living data you may want to consider using a queuing system like ActiveMQ rather than a DBMS.

It may also make sense simply to build a multi-threaded Java app to load your batches of records, deposit them into a Queue object in RAM (a ConcurrentLinkedQueue instance may be suitable) and process them one by one. This approach will give you much more control over the performance of your system than you will have by using a MySQL table as a queue.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Would try implementing `ConcurrentLinkedQueue`. This would definitely be more efficient than doing any database activity. Thanks Ollie – Avinash Nair Jan 26 '13 at 08:20