How to make polling from database scalable?

Question

I am try to find a scalable way to allow for my desktop application to run command when a change in the database is made.

The application is for running a remote command on your PC. The user logs into the website and can choose the run the command. Currently, users have to download a desktop application that checks the database every few seconds to see if a value has changed. The value can only be changed when they login to a website and press a button.

For now it seems to be working fine since there aren't many users. But when I hit 100+ users hitting the database 100+ times every few seconds is not good. What might be a better approach?

If you're using SQL Server, [you can use `SqlDependency` to be notified when a base-table changes](https://stackoverflow.com/questions/12353225/is-there-a-way-to-be-notified-in-net-when-data-changes-in-the-sql-server-like). — Dai, Jul 18 '21 at 18:02
is there anything in mysql that allows similar functionality? — laskdjf, Jul 18 '21 at 18:45
I imagine the "best" approach would be writing a **native** MySQL UDF ( see https://dev.mysql.com/doc/extending-mysql/8.0/en/adding-functions.html ) which would be invoked by a `TRIGGER` to then use IPC to implement a notification system. But honestly the storage-layer is not the best place to implement this. — Dai, Jul 18 '21 at 18:59

score 3 · Answer 1 · answered Jul 18 '21 at 19:58

It's true that polling for changes is too expensive, especially if you have many clients. The queries are often very costly, and it's tempting to run the queries frequently to make sure the client gets notified promptly after a change. It's better to avoid polling the database.

One suggestion in the comments above is to use a UDF called from a trigger. But I don't recommend this, because a trigger runs when you do an INSERT/UPDATE/DELETE, not when you COMMIT the change. So a client could be notified of a change, and then when they check the database the change appears to not be there, because either the transaction was rolled back, or else the transaction simply hasn't been committed yet.

Another reason the trigger solution is not good is that MySQL triggers execute once for each row changed, not once for each INSERT/UPDATE/DELETE statement. So you could cause notification spam, if you do an UPDATE that affects thousands of rows.

A different solution is to use a message queue like RabbitMQ or ActiveMQ or Amazon SQS (there are many others). When a client commits their INSERT/UPDATE/DELETE, they confirm the commit succeeded, then post a message on a message queue topic. Many clients can be notified efficiently this way. But it requires that every client who commits changes to the database write code to post to the message queue.

Another solution is for clients to subscribe to MySQL's binary log and read it as a change data capture log. Every committed change to the database is logged in the binary log. You can make clients read this, and it has no more impact to the database server than a replication client (MySQL can easily support hundreds of replicas).

A hybrid solution is to consume the binary log, and turn those changes into events in a message queue. This is how a product like Debezium works. It reads the binary log, and posts events to an Apache Kafka message queue. Then other clients can wait for events on the Kafka queue and respond to them.

Good point about the trigger being invoked before the write is committed, I didn't think of that. That said, _in practice_, and considering the (seemingly) SOHO-nature of the OP's described database I don't think that DML failures will happen much at all... ymmv, ofc. — Dai, Jul 18 '21 at 23:08
Oh, and idea: how about a custom MySQL replication subscriber (which isn't a database at all), but just listens to the pub/sub messages? I assume such a thing is possible without needing to manually read the binary log. — Dai, Jul 18 '21 at 23:11
Reading the binary log is exactly what replication clients do. For example, this is how a CDC solution like Debezium subscribes to changes. Here's another blog about it: https://www.percona.com/blog/2016/09/13/mysql-cdc-streaming-binary-logs-and-asynchronous-triggers/ — Bill Karwin, Jul 19 '21 at 00:03
With apologies, I could have phrased myself better: when you said that they'd read the binary log file I thought you meant they'd all have to come with their own reimplemented library for reading and parsing the MySQL binary log file on-disk - rather than using a well-documented and well-supported API or service of MySQL's. (I guess I expected this after too much experience with awful mid-1990s 4GL database products that are still around where you really did have to reverse-engineer and parse closed, undocumented proprietary binary database disk/file formats in order to do anything...!) — Dai, Jul 19 '21 at 00:44

How to make polling from database scalable?

1 Answers1