12

I am putting together a regular Java EE application on jboss7 that will use JPA in the data tier. I would like to make this application such that it scales up with load. While it is pretty clear how to scale up the web tier: create more machines and throw them behind a load balancer, scaling up the data tier is less so.

I can probably cluster my database (MySQL). Stil, that leaves the JPA layer unclustered. Ideally, JPA will scale up by using in (clustered) memory caching backed by MySQL.

When I look around, all information around JPA scaling seems to be 3-4 years old. People talk about ehcache, memcached and infinispan. I am not sure if this is still current.

Can someone tell me the state of the art in Java EE clustering and scaling, especially in the data tier.

Arjan Tijms
  • 37,782
  • 12
  • 108
  • 140
Raj
  • 2,852
  • 4
  • 29
  • 48
  • I got amazing answers from Piotr and James. It is unfortunate SOF will only let me mark one as the correct answer. Thanks to both. Next, I need to find out what is best of breed in caching: Why would I use anything but memcached. – Raj Apr 26 '12 at 17:21

2 Answers2

8

Various caching strategies are still the way to scale JPA/Hibernate (you basically named the most popular options in your question). Nothing extraordinary happend since 4-5 years in this field, as far as I know. One more option you haven't mentioned is JBoss Cache. So the Second Level Cache for JPA/Hibernate still rules in this area.

Why no progress here? My wild guess is that first of all people, who need scalable application tend to ignore JPA and Hibernate in areas where high performance is needed. Usually people go with SQL dressed in Spring Framework JDBCTemplate helpers and transaction management. Then scalability is the matter of database capabilities in this area.

The other trend is to use No-SQL databases. There is plany of solutions: MongoDB, CouchoDB, Cassandra, Redis, to name a few. These are usually Google BigTable like key-value storages (this is oversimplification, but it is more or less the idea behind that approach) and they scale as hell, if you accept their limitations (relations are no longer managed easily, etc.).

Piotr Kochański
  • 21,862
  • 7
  • 70
  • 77
  • Hi Piotr, Thanks for your response. I am interested in learning more about JDBC template helpers. Are there any good pointers? – Raj Apr 26 '12 at 17:16
  • http://static.springsource.org/spring/docs/3.1.x/spring-framework-reference/html/jdbc.html is a good read. There are also books, "Spring in Action" is ok – Piotr Kochański Apr 26 '12 at 17:54
  • its HIGHLY unlikely that people who have been tasked with performance questions ignore or avoid JPA altogether - that would'nt make much sense since JPA itself has little impact on performance ... it can easily be compared to any well-optimized SQL statement ... the magic lies within the criteria / criterion API of JPA, a bad criterion will be slow but also will a bad SQL statement, in fact one may reduce JPA overhead down to "nearly inexistent" with JPQL ... but thats another story. Everyone who ignores / avoids JPA automatically isnt doing his job right. Sorry. – specializt Feb 15 '16 at 06:37
6

There are many solutions, the two main categories of solutions are:

  • scaling the database
  • using a clustered cache to reduce database load

EclipseLink supports data partitioning for sharding data across a set of database instances,

see: http://java-persistence-performance.blogspot.com/2011/05/data-partitioning-scaling-database.html

You can also use MySQL Cluster,

see: http://www.mysql.com/products/cluster/

Oracle TopLink Grid provides EclipseLink JPA support for integration with Oracle Coherence as a distributed cache,

see: http://www.oracle.com/technetwork/middleware/ias/tl-grid-097210.html

EclipseLink's cache supports clustering through cache coordination,

see: http://wiki.eclipse.org/EclipseLink/Examples/JPA/CacheCoordination

James
  • 17,965
  • 11
  • 91
  • 146
  • Hi James, Thanks for your response. We want to do both to scale and avoid any single points of failure. In all my investigation, memcached seems "best of breed": it has wide support and seems to be able to do what any other cache would do. What do you think? – Raj Apr 26 '12 at 17:19