0

We want to generate a report with 10,000 records by querying 12 million records from the database and show in an user interface, it's a REST API call. We have two approaches right now to achieve this.

Here is the little background about our database tables design:

Design#1: We uses SQL database and we have a big legacy database table(Single) where it has more than 12 million records in a given time, we always keeps one year data in this table. Every month we have a backup policy, which moves data to a history table, even with this backup policy we end up more than 12 million records.

Design#2: As part of the above big table redesign, we created 12 tables based on certain criteria and we are persisting the above 12 million records into these 12 tables more or less equally, a million records per each table.

Approach#1: Query 12 tables simultaneously using java executor API with callable tasks and send the result to the caller.

Approach#2: Query the single big legacy table and send the result to the caller.

Please suggest me which approach better suits with optimal performance.

Let me know if anything unclear.

MiguelKVidal
  • 1,498
  • 1
  • 15
  • 23
  • 2
    How many columns? What datatypes used? What are your index? Is your WHERE clause only on indexed columns? Do you have any aggregator function on this query? There are many things that will impact on performance. Maybe, in your case, joining 12 tables is costly than a big single table. Can you give us more hints and info about your problem? – MiguelKVidal Jul 13 '17 at 23:05
  • 15 columns, data types used are number, varchar, date. – Om Sri Udaya Kumar Gade Jul 13 '17 at 23:22
  • Yes WHERE clause is on the indexed columns, no aggregator functions. Idea is not to join all the 12 tables in the database, get the result of 12 tables to java and do some business logic, finally add the results to a java list and return the result to the caller. – Om Sri Udaya Kumar Gade Jul 13 '17 at 23:24
  • You need to retrieve the least amount objects from the database. Are you using SQL queries or something JPA-based, like Hibernate? Do you have any cache engine? 10K records to return in one go can be a lot, but it's hard to say based in you info. Suppose the average size of your lines are 1Kb, then your result set would be 10Mb. If a smartphone feches it, it can be a pain, compared to a desktop. Will you show 10K lines or only a report based on it? – MiguelKVidal Jul 13 '17 at 23:32
  • Using spring JDBC with SQL queries, no cache engine. From all the tables we are expecting maximum of 1000 records in each REST client request, but the table size is more because we keep an year data, primarily trying to pick a best approach in querying single large table vs querying multiple tables with optimal performance. Please let me know if any details required. – Om Sri Udaya Kumar Gade Jul 14 '17 at 05:43
  • From your info, I would go with one big table. But please, take a look at those answers: https://stackoverflow.com/questions/1125004/which-is-more-efficient-multiple-mysql-tables-or-one-large-table https://stackoverflow.com/questions/20222853/what-is-better-faster-joining-many-tables-or-select-one-big-table https://stackoverflow.com/questions/13810073/mysql-multiple-tables-or-one-big-table https://stackoverflow.com/questions/37869223/sql-multiple-tables-vs-one-big-table Maybe those links help you out with more info than I can write in a comment. Remember: with one big table, use index! – MiguelKVidal Jul 15 '17 at 02:00
  • There are many approaches for different kinds of problems. One big table? Then you (maybe) have duplication. Many small tables, then you will need JOINs or multiple queries. What's the best for your problem? Hard to answer without really knowing your problem. – MiguelKVidal Jul 15 '17 at 02:05

0 Answers0