1

I'm trying to write a MySQL stored procedure that accepts a SELECT query and executes it against a list of databases. Is this even possible with MySQL?

Each customer in our application has their own database on the same instance. All customer databases are identical in their schema structure. Sometimes, I need to execute a simple SELECT query but across all the customer databases (e.g. SELECT COUNT(*) FROM users).

I have stored procedures for the common recurring "instance-wide" queries like the user count but I don't want to create more of those for one-time queries (e.g. querying for corrupted records due to by buggy code, querying for columns that we are planning to deprecate, etc.).

My current solution is that I have a node script that I run locally to generate SELECT queries for each database and then concatenate them all using UNION to generate a giant query that I then execute on the database instance.

SELECT 'customerdb1' AS customer,
       COUNT(*) AS user_count
FROM customerdb1.users
UNION
SELECT 'customerdb2' AS customer,
       COUNT(*) AS user_count
FROM customerdb2.users
UNION
SELECT 'customerdb3' AS customer,
       COUNT(*) AS user_count
FROM customerdb3.users
Noel Llevares
  • 15,018
  • 3
  • 57
  • 81
  • 1
    Sure it's possible but it sounds like a bad idea.. Stored procedure can runs on a possible different MySQL user and or privileges.. How would you control it security wise? . – Raymond Nijland Jan 16 '19 at 22:27
  • The stored procedure will only be used by a special account which only has `SELECT` privileges to all other databases. For what reasons is it a bad idea? Security? Performance? Maintainability? – Noel Llevares Jan 16 '19 at 22:40
  • @dashmug Here's a joke around arbitrary SQL input: https://xkcd.com/327/ Stored procedures are created to abstract the database logic (the queries) away. Allowing arbitrary SQL code to pass through dismantles the original purpose. Instead, you could fully clarify and define all possible functionality the clients need - create individual stored procedures - and then provide the documentation on the procedures available. This rigor would self-document the available functionality (with individual stored procs) and fully clarify the application behavior to your clients and yourself. – Jordan Stefanelli Jan 16 '19 at 23:20
  • Yeah, I know Little Bobby Tables. Actually, it's "arbitrary" in the sense that I want to run MY ad-hoc queries (NOT my users'). I was trying to avoid the boilerplate of retrieving a list of databases and then executing a query to each one. – Noel Llevares Jan 17 '19 at 01:01
  • Your first sentence states "stored procedure accepts arbitrary SELECT statement" which, by virtue of the word, is literally anything. Your follow-up question "Is this a security issue?" reinforces that it is still arbitrary and unknown. change "arbitrary" to "statements I pre-define" and you'll get very different answers. – Jordan Stefanelli Jan 17 '19 at 01:26
  • @JordanStefanelli Cool. Sorry for the confusion. – Noel Llevares Jan 17 '19 at 03:59
  • @dashmug pardon if I was rude - if the goal is to query multiple db's for data, you may have already seen this on SO: https://stackoverflow.com/questions/2132654/querying-multiple-databases-at-once – Jordan Stefanelli Jan 17 '19 at 12:36

1 Answers1

4

You can do it in a stored procedure with PREPARE and EXECUTE, but it's considered a security vulnerability to run arbitrary SQL in this way. You said the procedure would only be used by a special account, but just allowing the procedure to exist is a risk. What if privileges are modified and allow anyone to run the procedure?

It will also be pretty slow to run this query across many schemas, because MySQL only runs one thread per query. It will have to run the queries serially, collecting the result in a temp table.

Also you have a limit on the length of any single SQL query, which is max_allowed_packet. This should be pretty large, but you can still exceed the length if you have enough UNION terms.

I worked on a site like the one you describe, where there were many schemas, one per customer, with identical tables. When we wanted to run a query across all schemas, I would run a simple query (no UNION) in many parallel threads, and collect the results in application code.


See https://thedailywtf.com/articles/For-the-Ease-of-Maintenance for a fun story about stored procedures that allow input of arbitrary SQL.

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • 1
    Right -- It'd be a good idea to constrain the number of worker threads. Don't just blindly run as many threads as the number of schemas. That's a good way to design a system that works well today, but breaks without warning sometime in the future after your customer count grows. – Bill Karwin Jan 16 '19 at 23:39
  • accidentally removed mine other comment blame it to browsing stackoverflow on mine mobile and pressing wrong things you shoudnt. But still even when you dont run as s many threads as schemas.. You still have to store all the tables data or needed data in the memory to work with in the application ofcource which still can break in the future if the customer Count or data grows – Raymond Nijland Jan 16 '19 at 23:50
  • 1
    In the example, it's just a `SELECT COUNT(*)` which would return a single integer per schema. You'd need an awful lot of schemas before that would cause a scalability problem in your app. If you reach that point, then you could store each result in a table. – Bill Karwin Jan 17 '19 at 00:15
  • Yes true i said that also in mine other first accidentally comment. Because this a simple Count query example this should be ok. The topicstarter is talking about arbitrary SELECT queries the possible required data or sorting requirements might be more resource hungry and indeed might require table storage to “clear” the ram memory like you are suggesting – Raymond Nijland Jan 17 '19 at 01:34
  • 2
    That makes me even more sure that a one-size-fits-all procedure that executes arbitrary SQL is not a good idea. One needs to choose what to do with the results based in part on the nature of the query. – Bill Karwin Jan 17 '19 at 02:09