2

I have a Mysql table with some data (> million rows). I have a requirement to sort the data based on the below criteria

1) Newest

2) Oldest

3) top rated

4) least rated

What is the recommended solution to develop the sort functionality

1) For every sort reuest execute a DBQuery with required joins and orderBy conditions and return the sorted data

2) Get all the data (un sorted) from table, put the data in cache. Write custom comparators (java) to sort the data.

I am leaning towards #2 as the load on DB is only once. Moreover, application code is better than DBQuery.

Please share your thoughts....

Thanks, Karthik

Karthik
  • 91
  • 3
  • 9

7 Answers7

3

Do as much in the database as you can. Note that if you have 1,000,000 rows, returning all million is nearly useless. Are you going to display this on a web site? I think not. Do you really care about the 500,000th least popular post? Again, I think not.

So do the sorts in the database and return the top 100, 500, or 1000 rows.

Tony Ennis
  • 12,000
  • 7
  • 52
  • 73
0

Fetching a million rows from the database sounds like a terrible idea. It will generate a lot of networking traffic and require quite some time to transfer all the data. Not mentioning amounts of memory you would need to allocate in your application for storing million of objects.

So if you can fetch only a subset with a query, do that. Overall, do as much filtering as you can in the database.

And I do not see any problem in ordering in a single queue. You can always use UNION if you can't do it as one SELECT.

Dmytro Shevchenko
  • 33,431
  • 6
  • 51
  • 67
0

It's much faster to do it in the database:

1) the database is optimized for I/O operations, and can use indices, and other DB optimizations to improve the response time

2) taking the data from the database to the application will get all data into memory. The app will have to look all the data to redorder it without optimized algorithms

3) the database only takes the minimun necessary data into mamemory, which can be much less than all the data whihc has to be moved to java

4) you can always create extra indices on the database to improve the query performance.

JotaBe
  • 38,030
  • 8
  • 98
  • 117
0

I would say that operation on DB will be always faster. You should ensure that caching on DB is ON and working properly. Ensure that you are not using now() in your query because it will disable mysql cache. Take a look here how mysql query cache works. In basic. Query is cached based on string so if query string differs every time you fetch no cache is used.

michal.kreuzman
  • 12,170
  • 10
  • 58
  • 70
0

AFAIK usually it should run faster if you let the DB sort your data.

And regarding code on application level vs db level I would agree in the case of stored procedures but sorting in SELECTs is fine IMHO.

If you want to show the data to the user also consider paging (in which case you're better off with sorting on the db level anyway).

Puce
  • 37,247
  • 13
  • 80
  • 152
0

You do not have four tasks, you have two:

sort newest IS EQUAL TO sort oldest AND sort top rated IS EQUAL TO sort least rated.

So you need to make two calls to db. Yes sort in db. then instead of calling to sort every time, do this:

1] track the timestamp of the latest record in the db
2] before calling to sort and retrieve entire list, check if date has changed
3] if date has not changed, use the list you have in memory
4] if date has changed, update the list 
kasavbere
  • 5,873
  • 14
  • 49
  • 72
0

I know this is an old thread, but it comes up in my search, so I'd like to post my opinion. I'm a bit old school, but for that many rows, I would consider dumping the data from your database (each RDBMS has it's own method. Looks like MySQLDump command for MySQL: Link ) You can then process this with sorting algorithms or tools that are available in your java libraries or operating system.

Be careful about the work your asking your database to do. Remember that it has to be available to service other requests. Don't "bring it to it's knees" servicing only one request, unless it's a nightly batch cycle type of scenario and you're certain it won't be asked to do anything else.

Community
  • 1
  • 1
Joe Hayes
  • 101
  • 3