Should I use multi threading

Question

I am writing a function in java to generate and save with a hibernate a unique code and then sending it in a get request, I will have to do this operation 30,000 times in one call.

Will multi threading help me make it faster even if I can't separate the HttpRequest from the hibernate insert ?

score 5 · Answer 1 · edited Apr 13 '17 at 12:42

5

I guess you're willing to make 30 000 inserts as fast as possible. You could start like 100 threads, and do 3 000 inserts from those threads - however for each thread you need a dedicated database session (a db connection) - otherwise you don't get any performance benefit.

Think about concurrency trough the whole chain

Thread or not? Actually, this depends on your database.

For example, Oracle allows you to execute inserts from muliple sessions https://dba.stackexchange.com/questions/16131/how-does-oracle-handle-multiple-concurrent-inserts-against-one-table It means that your threads can work together to send data as quickly as possible. It is a good idea to reduce commit rate, and commit your transaction only after 100-1000 inserts for speed.

However, things are not this easy on MS SQL https://social.msdn.microsoft.com/Forums/sqlserver/en-US/db00512d-0dc5-4bae-9ea4-7d6726d2d731/inserting-into-a-single-heap-table-from-multiple-sessions?forum=sqldatabaseengine and there are SQL engines which lock the whole table, so actually your inserts will be executed totally serially.

How many threads?

This again depends on your database. In production Oracle installations, it is a good idea to open not more than 20-100 sessions per application, so you shall not try making 1000 threads. Simply keeping up 1000 database connections would hurt both the database server, and likely your application too.

Optimize roundtrips

You shall notice that the insert time is mostly wasted on the individual calls. I.e. the 30 000 insert is 30 000 roundtrip to the database server. This takes a significant time. Unfortunately, Hibernate is super-inefficient also in this, so you can't do too much.

A good solution would be using JDBC batch mode, which works with Oracle remarkably well (giving 10-100x improvement compared to inserts), see http://viralpatel.net/blogs/batch-insert-in-java-jdbc/

Think about your problem

I am sure that inserting 30 000 things as a result of one operation is a bad idea :) Instead of improving insert speed, think about why do you need to insert 30 000 things? Do you insert a picture byte-by-byte?

edited Apr 13 '17 at 12:42

Community

1
1

answered Mar 16 '16 at 16:31

Gee Bee

1,794
15
17

Nice review. But why do you think that Hibernate is inefficient when it comes to batching? Actually, automatic batching has always been one of the most praised features of ORM solutions. There may be [some configurations](http://stackoverflow.com/a/35794220/4754790) required, but I never had issues with it. – Dragan Bozanovic Mar 16 '16 at 17:02
@DraganBozanovic, I found various lousyness how Hibernate is doing its job. As a DB architect for 10+ years, I might have too big expectations though :) I tested this a year ago or so. Batching simply did not work when insert involves an identity generator. Instead, Hibernate generate a lot of select next id, then insert operations, executed one by one. Batching of updates isn't shameless either: http://vladmihalcea.com/2015/03/18/how-to-batch-insert-and-update-statements-with-hibernate/ – Gee Bee Mar 16 '16 at 17:26
Of course, ORM tools and Hibernate are great tools for what they're built for - map objects to table rows. But as soon as we go out from here, accessing multiple rows, need to benefit the SQL operations of a specific database, then it is rather a curse. The point of ORM - i.e. mapping between a meaningful domain model of yoru application and an optimized storage model of your database - can be done much better with other solutions. I found a PL/SQL data access abstraction layer works much better, and I have full control on query execution and optimization. YMMV, of course :) – Gee Bee Mar 16 '16 at 17:29
Regarding batching, it does not work with id generator of type `identity`, true, that's what is written in the link I provided also. But works for all the other id generators. It's not Hibernate's fault, it's the issue with database providers and what they expose to jdbc. Regarding the second comment, I couldn't agree more, when working with an ORM people tend to use it for everything (with a hammer every problem seems like a nail), although most ORM solutions provide a rich API for native queries. As always, it's just a matter of picking right tool for the job. – Dragan Bozanovic Mar 16 '16 at 17:39
Actually, - I am really not a Hibernate fan, you must have noticed that - it is not the db vendor's fault of the single id generation. For example, in Oracle (which just like any other db gives sequences one by one) you can do select level,seq1.nextval from dual connect by level<=10; to get next 10 sequences. For inserting 30000 records, it may make sense to do it on one select, and one batch insert, instead of 30000 selects and 30000 inserts. It is totally doable, oracle sites are full with this info, still Hibernate people are ignorant. – Gee Bee Mar 16 '16 at 17:41
Also it is a question of the century, why people do database code in Java/.NET. Even in the 90s everyone knew to use stored procedures. Since then, Oracle has a full OOP database support, packages, a lot of built-in function, endlessly optimized. Instead we pull data to the Java side with a for-loop, sum it up, and dare to post that as an "example" to follow? There is something strange in the universe :) – Gee Bee Mar 16 '16 at 17:47
If you insert 10 records in a batch, and then you need the generated ids to fill in the foreign keys for another batch of the associated records, how can you get those keys without the help of db vendor? What's your other choice than inserting them one by one? I don't understand which other choice ORM provider or any other persistence framework has? – Dragan Bozanovic Mar 16 '16 at 17:48
This is a fundamental problem of "where to put the storage logic". Generating foreign keys for the associated record is a storage logic problem, what is now brought to the application instead of solving it in the database. The right solution here is to use a PL/SQL package, pass the data from the application perspective, and let the PL/SQL package to handle the storage logic, and fiddle with foreign keys and sequences. Those are not your duty :) In fact, you can push down 99% of the business logic of your app to the db, see http://thinkoracle.blogspot.hu/2006/01/plsql-vs-j2ee.html – Gee Bee Mar 16 '16 at 17:52
The problem is that SQL - by design - is optimized for multi-row operations. An ORM mapper works row-by-row, hence the bottleneck if you want to do anything more serious. https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:5662105500346364576 – Gee Bee Mar 16 '16 at 17:55
So this is how it goes: Business logic (I want to make a salary rise) - Storage logic (hey we need to use two tables because of n-m relations, and ids and foreign keys) - storage (tables and indexes). Hibernate makes its assumption and generates some storage logic code, and hide the storage alltogether. It may be efficient or not - depending on the rows to move. But, even business logic can be ineffective. You want to add a salary increase to 1 million of your employees - that is one line in SQL, but pulling 1 million records to the Java code, update, send back to the db in Java. – Gee Bee Mar 16 '16 at 17:58
How is it not my duty to take care of ids if I need to return the ids to UI so that the user can use it in subsequent requests? :) I have nothing against using stored procedures for things that are bottlenecks. I do it myself. But putting everything to db means spending millions to expensive hardware and licenses. Also, all of the widely available tools, libraries and know-how in the middle layer would need to be reimplemented in stored procedures, meaning more costs not only for implementation, but because of time to delivery, testing, bugs... – Dragan Bozanovic Mar 16 '16 at 18:08
Regarding your extreme example with millions of records, I agree completely with you. So do ORM providers authors. It is possible to do it in bulk HQL statements in Hibernate for example. And there is a rich native query API as well for everything else. Anyway, SO is not a forum, probably moderators will delete these comments anyway. :) – Dragan Bozanovic Mar 16 '16 at 18:10
_"For inserting 30000 records, it may make sense to do it on one select, and one batch insert, instead of 30000 selects and 30000 inserts. It is totally doable, oracle sites are full with this info, still Hibernate people are ignorant."_ Actually, sequences do not suffer from this, the support for allocation size is [there](https://docs.oracle.com/javaee/7/api/javax/persistence/SequenceGenerator.html#allocationSize--) for ages in the ORM frameworks. Batching works fine with Hibernate and sequences. – Dragan Bozanovic Mar 16 '16 at 18:26
Hm. It has a default value of 50. I have never seen any attempt from Hibernate to pre-select the sequences or use any batching, although I was spending two days on optimizing this. I was checking what is happening with a JDBC sniffer tool, and it was totally missing any reasonable optimization (and was slow as well). I am happy if it works for you better :) – Gee Bee Mar 16 '16 at 19:58

Should I use multi threading

1 Answers1