10

I want to add a cache to a distributed application that uses a Galera cluster (mysql) as DB. Each application node (java) has a local cache for reads and comunicates with a DB node.

The problem is I don't know how to invalidate entries in cache when these are modified by another node in the cluster.
My first attempt was to use DB triggers but I soon realised that triggers aren't thrown by replication updates.
Other ideas are monitoring network traffic searching for modifications or watching binlogs of DB, but both of them seems very difficult to implement.

The question is: Is there any practical way to detect changes in one node coming from another in replication?
Other ideas in implementing cache invalidation?

I think I could also use a distributed cache that communicates changes through the application nodes but I'd prefer use an isolated cache in each app node and delegate data synchronization between nodes to the DB cluster. I see caches communication as redundant network traffic...

Thanks in advance.

fonkap
  • 2,469
  • 1
  • 14
  • 30
  • IMHO these "redundant network traffic" may be useful as DB communication may be slowed down by other concerns. The caches communication is pretty simple and needs just a few bytes to be transferred, so I think it's a good idea (subject to that you mayn't need any caching as stated in the answer). – maaartinus Jul 22 '17 at 20:48
  • @maaartinus thank you very much for the bounty, and thank you for your comment. Can we conclude that is best to evaluate the performance requirements and avoid caching at all and in the case it is really needed use some kind of distributed cache (maybe Infinispan)? – fonkap Jul 23 '17 at 18:22
  • You're welcome. I'm by far not that far... I'm just thinking about a future architecture using the cluster and exploring possibilities. If you need no caching, then you can save yourself a lot of problems. I'm using a single Java server with caching of the most common request types. I'm using Hibernate `EventListenerRegistry` for invalidations, assuming that there are no other modifications (which is true at the moment). – maaartinus Jul 23 '17 at 18:52
  • Sure! I was talking about multiple java servers and multiple mysql. With only one java server, I think you can get massive performance improvements using hibernate cache with little problem. – fonkap Jul 23 '17 at 19:10
  • Sure, multiple servers are planned for midterm future (this year); that's why I'm interested in the cluster. My caching isn't hibernate query cache, but a request-response cache skipping the whole processing (with invalidation based on Hibernate post commit events). According to [this article](http://puredanger.github.io/tech.puredanger.com/2009/07/10/hibernate-query-cache), Hibernate query cache gets invalidated on *every write to a corresponding table*, so it's rather ineffective. Haven't tried. – maaartinus Jul 23 '17 at 19:57
  • Well.. sorry if I am mistaken, I am a little rusty on the subject and I never was an expert, but I didn't mean a query cache but a 2nd level cache (provided by Infinispan) If I remember correctly it was very effective and it can be clustered.. I'd like to humbly suggest to investigate a little in that direction... (again, if your performance requirements need it) – fonkap Jul 23 '17 at 21:01
  • My bad, I'm rather green in this respect and I conflated the two (or the way they blew the query cache made me too skeptical about the 2nd level cache). I'm just reading http://blog.infinispan.org/2015/10/hibernate-second-level-cache.html, thank you. – maaartinus Jul 24 '17 at 01:33

2 Answers2

4

To put it bluntly, you have over-engineered the system.

You have a Cluster Node 'near' each Java client, correct? That is, each datacenter has a Node plus one or more Java clients?

At that point there is very little need for an extra cache between Java and the data; simply re-request the data and let MySQL do the caching.

If you have sluggish queries, then let's discuss them and see if they can be sped up.

The setup I describe should be able to handles hundreds, possibly thousands of queries per second. Is your requirement stiffer than that?

See wsrep_sync_wait for dealing with causality checks.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thank you very much for your response. You are absolutely right, I really didn't need that much performance, however, I think the main question is still valid and remains unanswered. – fonkap Jul 17 '17 at 13:57
  • Maybe your implicit answer ("you can't" ) is the only one, and you gave me the best possible answer. Anyway, if we find some consensus here I have no problem to accept it. – fonkap Jul 17 '17 at 14:01
1

The only way I can think of to detect changes using only MySQL is to parse the binary logs.

Since you asked for alternatives to that, you could use a messaging system (e.g. Apache ActiveMQ) to send cache invalidation messages between nodes so that each node can clear its own cache.

nickrak
  • 1,635
  • 15
  • 18
  • Thank you @nickrak. I think parsing binlogs may be feasible but I think it would produce a fragile solution... not sure. What I don't like of using a kind of jms is its asynchronous nature, I think I never would be sure if I am using stale cached values... I think it would be best using some caching solution with builtin distribution. – fonkap Jul 23 '17 at 18:26
  • @fonkap Galera cluster is synchronous, so if the commit completes on any node in the cluster, you won't get stale data on the other database nodes. If you're looking for a distributed cache, and want something that's in process, take a look at Hazelcast, for a separate instance, check out Redis and Memcached. – nickrak Jul 23 '17 at 22:08