1

I am a newbie to web development and am planning to build a scalable web app on AWS. I have an architecture level doubt on using Hibernate on AWS. Basically on AWS multiple EC2's will be running in different parts of the world, Now 1) Should they all be using Hibernate locally to connect to a central MySQL server? 2) Or there will be a single Hibernate machine and all EC2's will push DataBase related queries to him?

If its the 1) approach the, Does Hibernate support transaction management in such a distributed system? I believe caching would also be a major problem with 1) because 1 EC2 might not know that another one has updated a table in DB. How is ORM handled in such distributed systems in general?

Thanks in advance

Rahul
  • 824
  • 1
  • 12
  • 27

1 Answers1

2

Approach # 1

For us, scalability is important and we use your approach #1. We deploy isolated instances of Hibernate (as a service behind Tomcat servers in EC2). When demand increases and more instances are needed, it's easy to add more instances. Benefits of this approach:

  • encapsulation (not in the OO sense), as each instance is configured in isolation
  • easier configuration, as only a single instance image is needed.
  • scalable

About transaction management: this is done by the DB engine itself; not by Hibernate, and not by EC2 as you mentioned. You can have one large instance of MySQL as an RDS and it can handle transactions from several "clients" (Hibernate, JDBC, SQL Concoles, etc.).

Hibernate in this case 'doesn't matter'. The DB will also gracefully handle concurrency and deadlocks (may need tweaking) if this is what you're worried about. I think if you look at this architecture from a database engine perspective, things become easier to understand. You don't need a centralized integration tier (like Hibernate) to make sure the DB handles all transactions. The database is responsible for that.

Approach #2

I never seen it nor used it. Although theoretically it could provide come caching, it isn't scalable. What if you need 100 instances for your app? Will this ONE instance handle all the DB traffic? Probably not, but with approach # 1 it scales nicely. If your concern is only transaction management, approach # 2 isn't necessary.

Final Comments

Things are a bit more complicated when you have multiple MySQL databases instances around the world, instead of a centralized one. In that case, one DB instance doesn't know about the other. Example: your database in the US doesn't know the data of the one in Japan. In this case, nightly batch jobs could help, but consolidation and integration (ETL,EAI) are another issue. Since you have one instance things are more simple (correctly so) and I can't see why approach # 1 wouldn't take care of all the concerns you mentioned.

arnold
  • 735
  • 1
  • 7
  • 14
  • 1
    Thanks Arnold, my other worry about approach #1 was stale caches. Since Hibernate caches things itself, so it will take some time of the data posted by some EC2's to be visible to all other EC2's. For some cases it doesn't matter but getting update frequently may be the case for few tables. Is it possible to configure Hibernate to do less caching for some particular tables and retrieve data from RDS frequently? – Rahul Aug 14 '13 at 02:56
  • 1
    Hi @Rahul, yes, it is possible. I prefer to avoid caching "volatile" data, that you know will change a lot by other users/processes. However, more static data like products, categories, user roles, etc., how often do these change? Something like price (depends on business), available seats, inventory qty. on hand, etc. are more volatile and I wouldn't cache those entities (2nd Level caching). There's already a nice answer on stack overflow about Hibernate caching. I hope this helps: http://stackoverflow.com/questions/4852685/hibernate-cache-reference – arnold Aug 14 '13 at 15:25