0

I have a code where I am getting the data from various sources and sorting and ordering them to be send to the user.

I am taking the data by firing a query which contains multiple joins to a list of DTO, then again I am firing another query which further contains multiple joins to the same list of DTO. then I am adding both the lists of DTOs to be presented to the user.

Query 1: Select * from TABLE1, TABLE2....

Query 2: Select * from TABLE5, TABLE7....

dto1.addAll(dto2);

dto1.sort(Comparator....);

I am sorting it again programatically is because of below reason:

Query 1 returned sorted data lets assume
1,2,3,4
Query 2 returned sorted data lets assume
1,2,3,4

After combining both the lists, I will get
1,2,3,4,1,2,3,4
Expected data
1,1,2,2,3,3,4,4

My question is, on which case performance will be better?

  1. fetch the sorted data from both the queries, add the list and then sort and order them.
  2. fetch the unsorted data from both the queries, add the list and then sort and order only once.

In the first case, it will get sorted thrice, but on the second case, it will sort and order only once.

When I tested with putting hundreds of thousands of records in the table, I didn't found much difference, second case was a bit faster than the first one.

So, in case of efficiency and performance, which one should be recommended?

James Z
  • 12,209
  • 10
  • 24
  • 44
ruhewo
  • 97
  • 1
  • 10
  • If the queries target only one database, change the queries into one and let the DBMS do the sorting. Otherwise you could let it sort to and later merge/sort the two results, which should be quite efficient. At least in theory. But I don't know if and how the objects you use support such an approach. – sticky bit Dec 22 '18 at 18:36
  • 2
    *So, in case of efficiency and performance, which one should be recommended?* The one that demonstrates better performance when you perform valid benchmarks. Real optimizations would involve searching for algorithmically better solutions to what you are currently doing. Obviously sorting once should be 3x faster then sorting thrice; but you haven't really demonstrated that is what your code does in either case (if you are fetching sorted data, **why** do you then need to "sort and order them" again? It's already sorted)... – Elliott Frisch Dec 22 '18 at 18:42
  • `FROM TABLE1, TABLE2`? Don't use the comma syntax to join tables. It was already outdated end of last century. Use the [JOIN](https://stackoverflow.com/a/53583444/4003419) syntax instead. – LukStorms Dec 22 '18 at 18:46
  • @stickybit, as I mentioned, I couldn't merge the queries. Also, if I sort in the queries, so after merge, I again will have to sort it to get the final DTO. – ruhewo Dec 22 '18 at 18:48
  • @ElliottFrisch, I edited my question to demonstrate the reason for sorting third time. – ruhewo Dec 22 '18 at 18:52
  • 1
    Your ***merge*** is broken. You already know the two lists are sorted, and merging two sorted lists can produce a sorted output in linear time. – Elliott Frisch Dec 22 '18 at 18:59

3 Answers3

1

I think all three will have similar performance. You could get a little bit higher speed using one or the other but I don't think it will be significant.

Now, in terms of load, that's a different story. Are you more limited by CPU resources (in your local machine) or by database resources (in the remote DB server)? Most of the time the database will be sitting there idle while your application will be processing a lot of other stuff. If that's the case, I would prefer to put the load on the database, rather than the application itself: that is, I would let the database combine and sort the data in a single SQL call; then the application would simply use the ready-to-use data.

Edit on Dec 22. 2018:

If both queries run on the same database, you can run them as a single one and combine the results using a CTE (Common Table Expression). For example:

with
x (col1, col2, col3, col4, ...) as (
  select * from TABLE1, TABLE2... -- query 1
  union all
  select * from TABLE5, TABLE7... -- query 2
)
select * from x
order by col1

The ORDER BY at then end operates over the combined result. Alternatively, if your database doesn't support CTEs, you can write:

select * from (
  select * from TABLE1, TABLE2... -- query 1
  union all
  select * from TABLE5, TABLE7... -- query 2
) x
order by col1
The Impaler
  • 45,731
  • 9
  • 39
  • 76
  • Thanks for your response, I cannot combine the queries into one, if possible, I would have chosen that only. I have to make two different database calls. As I said, if I sort into the queries, I will still have to sort in the program since the two sorted queries will contains data for their own, so when we merge them ,we still have to sort them. – ruhewo Dec 22 '18 at 18:46
  • Any reason for using `WITH` and its baggage, instead of doing it all in one statement? – Rick James Dec 22 '18 at 22:57
  • If both queries run on the same database you can combine them into a single. See my edit. – The Impaler Dec 22 '18 at 22:57
  • @TheImpaler - Still overkill; see my Answer. – Rick James Dec 22 '18 at 22:59
1

Do it all in MySQL:

( SELECT ... )
UNION ALL
( SELECT ... )
ORDER BY ...

Don't worry about sorting in the two selects; wait until the end to do it.

ALL assumes that there are no dups you need to get rid of.

This approach may be fastest simply because it is a single SQL request to the database. And because it does only one sort.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Your solution is quite concise. I never really thought about it, but does the `ORDER BY` work over the combined result set? – The Impaler Dec 22 '18 at 22:59
  • @TheImpaler - Yes. This 'feature' has been there since at least version 4.0. – Rick James Dec 22 '18 at 23:00
  • @TheImpaler - In older versions, leaving out the parens left an ambiguity as to whether the `ORDER BY` belongs to the last `SELECT` or to the `UNION`. I use the parens to make it clear. – Rick James Dec 22 '18 at 23:02
  • This solution works but as I already mentioned that I can't join both the queries since it will generate tight coupling in the code. I will have to take the data separately – ruhewo Dec 24 '18 at 05:07
0

I think 2nd one is better performer because if you run a sorting algorithm after merging your two list. So you don't need to run sort query to db. So database sorting query cost not requiring your 2nd query.

But if you retrieve data with sorted order and then again you run sorting algorithm then it will must take some more cost to execute although its negligible.

flopcoder
  • 1,195
  • 12
  • 25