2

lets say I have a table called order with following data. I need to get the customer_name along with the no. of orders they have placed.

Table name: order

id | customer_name | item
1  | Siddhant      | TV
2  | Siddhant      | Mobile
3  | Sankalp       | Football

Desired output:

customer_name | no_of_orders
Siddhant      | 2
Sankalp       | 1

I tried below 2 queries to get the result:

select customer_name, count(customer_name) as no_of_orders
from order
group by customer_name;

This gives me the correct result but takes around ~10.5 secs to run

select ord.customer_name, count(ord1.customer_name) as no_of_orders
from order ord
inner join order ord1 on ord1.customer_name = ord.customer_name
group by ord.customer_name;

This gives me the square(correct count) in the result but runs in ~2 secs. I can get the square root to get the actual count.

I understand why the second query gives the square of the actual count in the output but can someone explain why it runs so fast compared to the first query?

PS: I am running these in Oracle SQL Developer.

APC
  • 144,005
  • 19
  • 170
  • 281
siddhant
  • 35
  • 6
  • *"can someone explain why it runs so fast compared to the first query?"* It's probably just an artefact of running two queries. The first query does all the work to read the rows from disk into memory, then the second benefits from that and does reads from memory. It's called the warm cache effect. Beyond that please read [this answer on asking Oracle database performance question](https://stackoverflow.com/a/34975420/146325). – APC Nov 18 '19 at 06:37
  • Is there any index on column `customer_name`? And also what is the time if you use `count(1)` instead of `count(customer_name)` in first query? – Popeye Nov 18 '19 at 06:47
  • @APC, I ran the second one first. – siddhant Nov 18 '19 at 06:53
  • @Tejash, no difference with count(1). Yes, there is an index on customer_name. – siddhant Nov 18 '19 at 06:58
  • In which case please follow the link I gave you, and post all the information we need to understand your scenario. Basically you're making a bald assertion that your database is doing a weird thing and asking us to explain why. But without more details are we can say is, *"yes, if that's what is actually happening then yes that is weird"*. There certainly is an explanation but it will be buried in those details. – APC Nov 18 '19 at 07:00
  • Most probably you are benchmarking the **time to get the first page**. This gives the observed result, as the query with `group by` must use `full table scan` so you wait till the whole table is red. The second query uses `nested loops` so the first few keys (`customer_name`) are processed and returned quickly (using index access). Simple add `order by` to the query and the *magic effect* disappears as now you force to read all rows before returning the first page. – Marmite Bomber Nov 18 '19 at 07:16
  • Please provide both plans and check the last time you gathered stats on this table. I guess it's become stale so you're getting weird results – Roman Nov 18 '19 at 20:38
  • 1
    @Marmite that is correct. I added a select count(*) on top of the result for both queries and the self join one was taking so much more time because now it had to fetch all the records first to take the count. – siddhant Nov 19 '19 at 05:49
  • Nice to hear @siddhant that the remote (probabilistic gues) diagnostic was right. I'll consider to post it in answer, if nobody else will do it before me;) – Marmite Bomber Nov 19 '19 at 07:00

1 Answers1

0

The first version is the one you should be using here:

SELECT customer_name, COUNT(customer_name) AS no_of_orders
FROM "order"
GROUP BY customer_name;

Absent a WHERE or HAVING clause, adding an index might not be too helpful here, because Oracle has to basically touch every record in the table in order to do the aggregation. As to why the second version appears to be faster, I speculate that the benchmarks you are using are not representative, because they are based on a fairly small table size. If you scale your table data to be in the tens of thousands of rows, I predict that the first version will be substantially faster.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360