-1

I have 2 tables which I join very often. To simplify this, the join gives back a range of IDs that I use in another (complex) query as part of an IN.
So I do this join all the time to get back specific IDs.

To be clear, the query is not horribly slow. It takes around 2 mins. But since I call this query over a web page, the delay is noticeable.

As a concrete example let's say that the tables I am joining is a Supplier table and a table that contains the warehouses the supplier equipped specific dates. Essentially I get the IDs of suppliers that serviced specific warehouses at specific dates.

The query it self can not be improved since it is a simple join between 2 tables that are indexed but since there is a date range this complicates things.

I had the following idea which, I am not sure if it makes sense.
Since the data I am querying (especially for previous dates) do not change, what if I created another table that has as primary key, the columns in my where and as a value the list of IDs (comma separated).
This way it is a simple SELECT of 1 row.
I.e. this way I "pre-store" the supplier ids I need.
I understand that this is not even 1st normal formal but does it make sense? Is there another approach?

Jim
  • 18,826
  • 34
  • 135
  • 254
  • 1
    You should post your query and the schema defining the tables before any real advice can be given. It does not seem terribly likely that a single join between two tables that are properly keyed should be causing a substantial performance hit. – Deadron Oct 24 '13 at 20:12

2 Answers2

1

It makes sense as a denormalized design to speed up that specific type of query you have.

Though if your date range changes, couldn't it result in a different set of id's?

The other approach would be to really treat the denormalized entries like entries in a key/value cache like memcached or redis. Store the real data in normalized tables, and periodically update the cached, denormalized form.


Re your comments:

Yes, generally storing a list of id's in a string is against relational database design. See my answer to Is storing a delimited list in a database column really that bad?

But on the other hand, denormalization is justified in certain cases, for example as an optimization for a query you run frequently.

Just be aware of the downsides of denormalization: risk of data integrity failure, poor performance for other queries, limiting the ability to update data easily, etc.

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • The date range is always stable. It is not arbitrary from the user. – Jim Oct 24 '13 at 20:14
  • My concern is if storing a string of PKs is something that I will not be frowned for... – Jim Oct 24 '13 at 20:16
  • My concern is that asking questions about database modelling without referring to the actual model being used is clueless. Without any additional info: denormalisation is bad mkay? – wildplasser Oct 24 '13 at 20:32
  • @wildplasser:That's fair.But the query itself finishes in 1.56 mins. If I post the tables and the query, to try to optimize the query, by how much would the performance improve? Down to 50 secs? Isn't even this too much for a web page? – Jim Oct 24 '13 at 21:49
  • Jim, it depends. I have helped some of my customers optimize a query in surprising ways, even without denormalization. My best case was improving a query by a factor of 94 million by creating a single index. In other cases, denormalization turns out to be necessary, but because of its downsides, I prefer to use it as a last resort. – Bill Karwin Oct 24 '13 at 21:54
  • Well, if the data model happens to be _extremely bad_, performance _could_ be improved by a factor 100 or so. If the model is optimal, and/or the query happens to be orthogonal to the model: there will be nothing to gain (maybe _except_ for denormalisation) – wildplasser Oct 24 '13 at 21:55
  • @wildplasser:What do you mean "orthogonal" to the model? – Jim Oct 24 '13 at 22:00
  • Ok I will update the OP with the format of the tables as soon as I can – Jim Oct 24 '13 at 22:02
1

In the absence of knowing a lot more about your application it's impossible to say whether this is the right approach - but to collect and consider that volume of information goes way beyond the scope of a question here.

Essentially I get the IDs of suppliers that serviced specific warehouses at specific dates.

While it's far from clear why you actually need 2 tables here, nor if denormalizing the data woul make the resulting query faster, one thing of note here is that your data is unlikely to change after capture, hence maintaining the current structure along with a materialized view would have minimal overhead. You first need to test the query performance by putting the sub-query results into a properly indexed table. If you get a significant performance benefit, then you need to think about how you maintain the new table - can you substitute one of the existing tables with a view on the new table, or do you keep both your original tables and populate data into the new table by batch, or by triggers.

It's not hard to try it out and see what works - and you'll get a far beter answer than anyone here can give you.

symcbean
  • 47,736
  • 6
  • 59
  • 94
  • +1. Materialized view, it did not even cross my mind!But I lost you a bit where you say:"test the query performance by putting the sub-query results into a properly indexed table." You mean test the performance using the view? – Jim Oct 24 '13 at 22:16
  • I mean build the table from the thing inside `IN (....)`, e.g. CREAT TABLE materializedview (id INT NOT NULL, PRIMARY KEY(id)); INSERT INTO materializedView (id) SELECT DISTINCT a.id FROM atable a INNER JOIN othertable b ON... then replace the IN(...) with a INNER JOIN to materialziedview – symcbean Oct 24 '13 at 22:25
  • MySQL does not support materialized views. So you were talking about a new table? – Jim Oct 25 '13 at 05:23
  • Yes - a materialized view *is* a new table - some DBMS provide automated means for maintaining these. – symcbean Oct 25 '13 at 08:55