2

(This is almost the same question as here but I am seeking specific clarification about performance, not solving any errors).

Consider a database table containing a medium-to-large number of attributes (columns), most of which are static or infrequently updated. The table is used in read operations extremely frequently.

If two or three of the attributes (columns) are updated frequently, is it advisable to split those off into a separate table for performance reasons?

One reason I had understood that this would be a benefit is that the query cache for a table is erased when the table is updated. In my case, database is MySQL, table type is InnoDB (row-level locking).

Edit: One of the frequently updated columns is not part of the frequent reads, but it is updated immediately after maybe 25% of the reads.

Another of the frequently updated columns is both part of some frequent reads and updated with each read, although I may batch the updates.

halfer
  • 19,824
  • 17
  • 99
  • 186
messy
  • 224
  • 2
  • 10
  • 3
    you'd have to benchmark. does the cost of now having to join against that separate table offset the savings of moving the columns elsewhere? – Marc B Jul 09 '15 at 19:17
  • @Marc B, is there large cost of a simple inner join in frequent reads? One of the columns in question won't need a join in its frequent usage, but another will. I assume if join isn't used much, the separate table is a clear win? – messy Jul 09 '15 at 20:07
  • depends on how you're going to be reading those fields. if they're frequently retrieved as part of normal `select` queries on the table, then the join overhead is going to add up quickly. and are you sure it's even necessary? just because you're changing the fields frequently doesn't mean everything else slows down... so like I said, test it both ways and choose whichever REALLY is faster, not the one you THINK will be faster. – Marc B Jul 09 '15 at 20:11
  • Join overhead because of the Cartesian join (before other conditions narrow the set)? Edited my question to clarify one reason why I thought it was necessary: I understood the query cache for a table is wiped when table is updated. Thought maybe there are other factors too. Thanks much for your input! – messy Jul 09 '15 at 23:12
  • Do you suffer performance degradation already? I would look at partitioning the table over several disks i.e. over several physical spindle in a RAID Array. This will allow much more concurrent updates as the IO bandwidth for the table is increased. – Namphibian Jul 09 '15 at 23:45

1 Answers1

2

If you have a wide table and/or a relatively narrow set of fields which are accessed much more frequently than others, it makes sense to separate those fields from the rest. However, instead of normalizing them to their own table, consider creating a subtable within the existing table. That is, create a covering index.

select  a, b, c
from    table
where   c, d, e;

In this simple case, you could create an index on a, b, c, d, e. The query may be completed by accessing only the index. The same with Updates:

update  table
    set d = 'something'
where   a, b, e;

As long as only the fields defined in the index are called out, only the index will need to be used.

But never assume anything. Develop timing tests so you can make meaningful comparisons before and after. Develop regression tests so that you can see if it does speed up the operations you want that it doesn't do so by unduly slowing down other important operations.

Here's another twist to consider. Take the query above. Suppose many thousands or millions of rows must be examined (where clause) in order to find a very few results. Would it then be better to create two covering indexes -- one for the selection list and one for the where clause? And if it is better (faster) then should field c be defined in the selection index and the where index or just in the where index?

I don't know the answers to those questions. You'll have to test.

UPDATE: If I understand the comment, you have a lot of queries for some fields and updates for other fields and you don't want them to interfere with each other since they use different sets of fields. You can already "isolate" the queries from updates. Just query under "no lock" conditions.

In MySQL, the procedure is (Note: this is for InnoDB only. Refer to the docs for other engines):

set session transaction isolation level read uncommitted;
query1...
query2...
query3...
set session transaction isolation level repeatable read;

or simply

set transaction isolation level read uncommitted;
query1...
query2...
query3...
commit;

You still may want to look at covering indexes for the queries and/or updates, but now that is a separate issue.

TommCatt
  • 5,498
  • 1
  • 13
  • 20
  • Thank you for thoughtful answer. I think my indexing is OK for where clauses but interesting idea to use for select also! But question is more about segregating the effects of updates away from frequently selected table (only sometimes do those two overlap). Maybe I asked wrong quetsion... – messy Jul 10 '15 at 18:06