1

I am using Ruby on Rails 3.0.7 and MySQL 5. In my application I have two database tables, say TABLE1 and TABLE2, and for performance reasons I have denormalizated some data in TABLE2 so that I have repeated values of TABLE1 in that one. Now, in TABLE1 I need to update some of those involved values and, of course, I must update properly also denormalized values in TABLE2.

What I can do to update those values in a performant way? That is, if TABLE2 contains a lot of values (1.000.000 or more), what is the best way to keep update both tables (techniques, pratices, ...)?

What can happen during the time it takes to update the database tables? For example, an user can have some problems on acceding some web site pages involving those denormalized values? If so, what those are and how can I handle the situation?

user502052
  • 14,803
  • 30
  • 109
  • 188
  • You should use a trigger (or perhaps two triggers--one on each table, if you allow updates to both tables) to keep these in sync--so that when one table is changed, the change is propagated to the other table. – Jonathan Hall Jun 29 '11 at 08:20
  • @Flimzy - As you say there will be a lot of work in order to update large tables... what about performance? – user502052 Jun 29 '11 at 08:22
  • 1
    Well, it means you're updating two tables every time you update one--so the performance could be roughly half (depending on many factors)--but that's what you want, right? You can't have data consistency without the performance hit that comes from keeping your data consistent. Now if your data doesn't have to be real-time consistent, you might be able to batch the propagation from one table to the other every hour/day/week/whatever, which might be more efficient, depending on your situation. But based on the info in your question, I can't say if that would work for you. – Jonathan Hall Jun 29 '11 at 08:25
  • @Flimzy - What information do you need in order to get a more "prolific" answer? – user502052 Jun 29 '11 at 08:27
  • Well, like I said, I don't know if real-time updates are important in your application. – Jonathan Hall Jun 29 '11 at 08:29
  • @Flimzy - Yes, those are so because users should continue to use the application without problems. – user502052 Jun 29 '11 at 08:33
  • If real-time updates are a requirement, then triggers are probably your only option. And they shouldn't add _that_ much load, proportionally speaking. At least not if they're written well, and your tables are properly indexed, etc. – Jonathan Hall Jun 29 '11 at 08:33
  • @Flimzy - And, *BTW*, how long it can take (approximately) a table of 1.000.000 records to be updated in a pessimistic case? – user502052 Jun 29 '11 at 08:37
  • Was denormalizing really the best approach to improve performance? Was there no caching, or query-optimization that could have achieved the same thing? – Pavling Jun 29 '11 at 10:21
  • @Pavling - I think not because I use that also to store user preferences. – user502052 Jun 29 '11 at 21:06

2 Answers2

3

There are a few ways to handle this situation:

  1. You can use a database trigger. This is not a database agnostic option and the RoR support of it is non-existent as far as I know. If your situation requires absolutely no data-inconsistency This would probably be the most performant way to achieve your goal, but I'm not a DB expert.
  2. You can use a batch operation to sync the two tables periodically. This method allows your two tables to drift apart and then re-synchronizes the data every so often. If your situation allows this drift to occur, this can a good option as it allows the DB to be updated during off hours. If you need to do the sync every 5 minutes you will probably want to look into other options. This can be handled by your ruby code, but will require a background job runner of some sort (cron, delayed_job, redis, etc.)
  3. You can use a callback from inside your Rails model. You can use "after_update :sync_denormalized_data". This callback will be wrapped in a database level transaction (assuming your database supports transactions). You will have Rails level code, consistent data, and no need for a background process at the expense of making two writes every time.
  4. Some mechanism I haven't thought of....

These types of issues are very application specific. Even within the same application you may use more than one of the methods depending on the flexibility and performance requirements involved.

salt.racer
  • 21,903
  • 14
  • 44
  • 51
1

Or you can maintain normalized set of data and have your two denomalized tables. And periodically sync them. Other way have a normalized table structure to maintain data (insert/update/delete) and write a materialized view to do the reporting, that is what you are achieving by unnormalized view. you can set data updation parameters for materialized views as per your requirements.

ViSu
  • 462
  • 2
  • 4
  • 17