-1

Any idea of how we can improve this query execution ? (maybe with some pre-aggregation)?

SELECT p.segment, country, count(distinct userid)
from pixel_data_opt p
WHERE country in ('US') 
  and segment is not null
GROUP BY p.segment, country;

I tried the below but it didn't help -

select  segment, country,sum(cnt)
from 
  (SELECT p.segment, country,  userid,count(*) as cnt
   from pixel_data_opt p
   WHERE country in ('US') 
     and segment is not null
   GROUP BY p.segment, country,userid
  )
group by 1,2;
APC
  • 144,005
  • 19
  • 170
  • 281
  • 2
    Don't spam tags of all database systems, pick the one you are using and tell us which it is. Is this for MySQL, Oracle or SQL Server? – DavidG Dec 15 '19 at 11:56
  • 1
    Please tag appropriate RDBMS MySQL <> SQL Server <> Oracle. – James Dec 15 '19 at 11:56
  • An index on segment, country, and userid will probably help but we can't help without knowing the specific DBMS you are using. – Dan Guzman Dec 15 '19 at 11:59
  • Oracle query optimisation is about balancing many different factors such as data volumes. For instance, in a comment you said *"number of rows is hundred of billions"*. This is the sort of information you need to put in your question. Please read [this answer on asking Oracle tuning questions](https://stackoverflow.com/a/34975420/146325) and edit your question to include the required information. – APC Dec 15 '19 at 14:00
  • 1
    One option that was not mentioned is using approx_count_distinct instead of count(distinct ...). This is possible if you do not really need 100% correct answer, but a close enough one would be ok. This function is available in 12.1 and later. The approx version does not do a sort, and uses no temp space and much less memory. – gsalem Dec 15 '19 at 15:59

2 Answers2

0

There's nothing wrong with your first query - though, it could have been where country = 'US' - but optimizer (as far as Oracle is concerned) is smart enough to figure it out.

Is the country column indexed? If not, do that.

Also, gather statistics on the table.

It would probably help if you posted some more info, e.g. number of rows involved, explain plan as it shows figures that mean something.

Littlefoot
  • 131,892
  • 15
  • 35
  • 57
0

For this query:

SELECT p.segment, country, count(distinct userid)
FROM pixel_data_opt p
WHERE country in ('US') AND
      segment is not null
GROUP BY p.segment, country;

You want an index on the table. There are several approaches. One reasonable choice is: pixel_data_opt(country, segment, userid).

I would suggest rewriting the query as:

SELECT p.segment, 'US' as country, count(distinct userid)
FROM pixel_data_opt p
WHERE country in ('US') AND
      segment is not null
GROUP BY p.segment;

and using the above index.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786