1

I have a very simple table (LOG) containing attributes MAC_ADDR, IP_SRC, IP_DST, URL, PROTOCOL. I want the first n lines containing IP_SRC, URL, #OfOccurrences ordered by decreasing #OfOccurrences for each IP_SRC in my table when PROTOCOL='DNS'.

To be clearer, I want to be able to list the first n most visited pages for each IP_SRC in my table.

I can get the most visited URL for each IP_SRC like this :

select ip_src,url,cnt
from (
    select ip_src,url,count(*) as cnt,protocol
    from log as b group by ip_src,url order by ip_src,cnt desc
) as c
where cnt>=(select MAX(cpt)
            from (select count(*) as cpt from log as b
            where c.ip_src==b.ip_src group by ip_src,url)
           )
      and protocol='DNS';

However, this solution is obviously not optimized.

Here is a more practical code (for the most visited URL for each IP_SRC) :

select ip_src,url,cnt
from (select ip_src,url,count(*) as cnt
      from log where protocol='DNS'
      group by ip_src,url
      order by ip_src,cnt asc)
group by ip_src;

This second option is way more faster ! However, I want the n most visited pages for each IP_SRC, and I can't figure out how to do.

Thanks for your help.

CL.
  • 173,858
  • 17
  • 217
  • 259

3 Answers3

1

Use a common table expression:

WITH Temp1 AS (
  SELECT ip_src, url, count(*) AS cnt
  FROM Log
  WHERE protocol = 'DNS'
  GROUP BY ip_src, url
)
SELECT ip_src, url, cnt
FROM Temp1 AS T1
WHERE url IN (
  SELECT url
  FROM Temp1 AS T2
  WHERE T2.ip_src = T1.ip_src
    AND T2.cnt >= T1.cnt
  ORDER BY cnt DESC
  LIMIT 3  -- or whatever you want it to be
)
ORDER BY ip_src ASC, cnt DESC;
CL.
  • 173,858
  • 17
  • 217
  • 259
  • Actually, this is not as efficient as the temporary table. My table is pretty small (less than 2000 entries), and your solution takes more than 31 seconds (RasperryPi 2). To compare, with the temporary table it takes only 2 seconds ! – thefiercerabbit Sep 16 '16 at 17:05
0
select x.ip_src, x.url, x.cnt
from (select ip_src,url,count(*) as cnt
      from log where protocol='DNS'
      group by ip_src,url
      order by ip_src, count(*) desc) AS x
group by x.ip_src;

Can you try this out ?

Tharsan Sivakumar
  • 6,351
  • 3
  • 19
  • 28
  • It does not improve my second code, and `desc` should be `asc` to collect the highest values. The whole point is to have as many answers per IP_SRC as I want. Here, I only have the highest value... Maybe I wasn't clear enough. – thefiercerabbit Sep 13 '16 at 12:08
0

Finally, with the use of a temporary table, I could manage to get what I wanted.

--First create a temp table of occurences
CREATE TEMPORARY TABLE TEMP1 AS
SELECT ip_src,url,count(*) AS cnt
FROM LOG
WHERE protocol='DNS'
GROUP BY ip_src,url
ORDER BY ip_src,cnt,url DESC;
--Then use a classic limit per group query
SELECT T1.ip_src,T1.url,T1.cnt
FROM TEMP1 AS T1
WHERE T1.url in (
      SELECT T2.url
      FROM TEMP1 AS T2
      WHERE T2.ip_src=T1.ip_src and T2.cnt>=T1.cnt
      ORDER BY T2.cnt DESC
      LIMIT 3 --Or whatever you want it to be
)
ORDER BY T1.ip_src ASC,T1.cnt DESC;

If someone knows how to do the same without the need of a temp table (or explain me why a temp table is a good solution), please express yourself.