Improving query performance of of database table with large number of columns and rows(50 columns, 5mm rows)

Question

We are building an caching solution for our user data. The data is currently stored i sybase and is distributed across 5 - 6 tables but query service built on top of it using hibernate and we are getting a very poor performance. In order to load the data into the cache it would take in the range of 10 - 15 hours.

So we have decided to create a denormalized table of 50 - 60 columns and 5mm rows into another relational database (UDB), populate that table first and then populate the cache from the new denormalized table using JDBC so the time to build us cache is lower. This gives us a lot better performance and now we can build the cache in around an hour but this also does not meet our requirement of building the cache whithin 5 mins. The denormlized table is queried using the following query

select * from users where user id in (...)

Here user id is the primary key. We also tried a query

select * from user where user_location in (...)

and created a non unique index on location also but that also did not help.

So is there a way we can make the queries faster. If not then we are also open to consider some NOSQL solutions.

Which NOSQL solution would be suited for our needs. Apart from the large table we would be making around 1mm updates on the table on a daily basis.

I have read about mongo db and seems that it might work but no one has posted any experience with mongo db with so many rows and so many daily updates.

Please let us know your thoughts.

To me it seems you have a rather basic problem and now you're trying to kill it off with lots of additional technology stacks. Why do you need all that data in a cache in the first place? What kind of operations are you performing on that data? Etc. I doubt that the information you provide is enough for a qualified answer... — Lukas Eder, Jan 21 '12 at 15:43
We will be mostly reading from this data at startup and then updating the data periodically as new user accounts are created or changes are made to their data or accounts are deleted — user1162439, Jan 21 '12 at 15:58
All the data at once?? Why do you need the data in memory? Modern databases are quite efficient with accessing data on disk... I don't understand what you're trying to do — Lukas Eder, Jan 21 '12 at 16:02
yup thats the reason i want to find which db will be a good solution to host this able which should be able to query fast very very fast... large datasets — user1162439, Jan 21 '12 at 16:48

score 4 · Answer 1 · answered Jan 23 '12 at 17:01

The short answer here, relating to MongoDB, is yes - it can be used in this way to create a denormalized cache in front of an RDBMS. Others have used MongoDB to store datasets of similar (and larger) sizes to the one you described, and can keep a dataset of that size in RAM. There are some details missing here in terms of your data, but it is certainly not beyond the capabilities of MongoDB and is one of the more frequently used implementations:

http://www.mongodb.org/display/DOCS/The+Database+and+Caching

The key will be the size of your working data set and therefore your available RAM (MongoDB maps data into memory). For larger solutions, write heavy scaling, and similar issues, there are numerous approaches (sharding, replica sets) that can be employed.

With the level of detail given it is hard to say for certain that MongoDB will meet all of your requirements, but given that others have already done similar implementations and based on the information given there is no reason it will not work either.

Improving query performance of of database table with large number of columns and rows(50 columns, 5mm rows)

1 Answers1