3

I have two tables :

main : id_main, field1, filter

main_logs (50 millions lines) : auto inc, id_main, path

I looking for the following results : id_main, field1, most common path

I tried the following query :

select id_main, 
  field1, 
  (select path, count(*) as cpt 
   from main_log 
   where main_log.id_main=main.id_main group by path order by cpt desc limit 1) 
from main 
where filter in (1,3,5);

Mysql return : Operand should contain 1 column(s)

If I remove path, results are correct, but I miss the path value.

select id_main, 
  field1, 
  (select path, count(*) as cpt 
   from main_log 
   where main_log.id_main=main.id_main group by path order by cpt desc limit 1) 
from main 
where filter in (1,3,5);

I don't need the result of count(*) but I need it for the "order by"

How can I write this query to get my results ? Thanks

main

id_main     | field1    | filter
1       | red       | 1
2       | blue      | 3
3       | pink      | 1

main_logs

autoinc     | id_main   | path
1       | 1         | home1
2       | 1         | home2
3       | 1         | home2
4       | 2         | house2
5       | 2         | house7
6       | 2         | house7
7       | 3         | casee

expected result

id_main     | fields1   | most common path
1       | red       | home2
2       | blue      | house7
3       | pink      | casee
benfromaix
  • 105
  • 1
  • 3
  • 9

4 Answers4

3

Try this:

SELECT m.id_main, m.field1, A.path 
FROM main m 
INNER JOIN (SELECT * 
            FROM (SELECT id_main, path, COUNT(*) cnt
                  FROM main_log ml  
                  WHERE EXISTS (SELECT * FROM main m WHERE ml.id_main = m.id_main AND filter IN (1,3,5))
                  GROUP BY id_main, path 
                  ORDER BY cnt DESC
                  ) AS A 
            GROUP BY id_main
            ) AS A ON m.id_main = A.id_main;

OLD CODE IGNORE

SELECT m.id_main, m.field1, A.path 
FROM main m 
INNER JOIN (SELECT * FROM (SELECT id_main, path, count(*) cnt
            FROM main_log 
            GROUP BY id_main, path 
            ORDER BY cnt DESC) GROUP BY id_main) as A on m.id_main = A.id_main 
WHERE filter IN (1,3,5);
Saharsh Shah
  • 28,687
  • 8
  • 48
  • 83
  • I think it could works on small dataset, but in my case mysql gone crazy (ERROR 126 (HY000): Incorrect key file for table '/tmp/#sql_11dc_1.MYI'; try to repair it) because the main_log table is huge (5Go). This query need to execute the subselect group by on all the table, without any condition. I can't do that – benfromaix Dec 25 '12 at 16:55
  • Do you have an index on `id_main` in both tables? – Barmar Dec 25 '12 at 17:01
  • It's the primary key in "main" table and it's an index in "main_log" – benfromaix Dec 25 '12 at 17:12
  • Add index on path column and try to run query – Saharsh Shah Dec 25 '12 at 17:14
  • actually there is one on path too. But a group by path without any where condition requiere a huge temp table size. This is why I tried to get the most common path for each id_main instead of a huge query – benfromaix Dec 25 '12 at 17:18
  • Check my updated answer and provide me the output of EXPLAIN query – Saharsh Shah Dec 25 '12 at 17:35
  • I tried with an explain but mysql can't provide an answer with the smale error answer. The subset from main_log can't be queried without condition, the table contain 40 millions lines, and I only need 5% of line with my option – benfromaix Dec 25 '12 at 18:11
  • Check my updated query and update me if it works or not. I will remove old query if my new query works – Saharsh Shah Dec 26 '12 at 04:57
3

You need to use :

SELECT id_main, field, 
    (SELECT path 
    FROM main_logs 
    WHERE id_main=main.id_main 
    GROUP BY path 
    ORDER BY count(path) DESC 
    LIMIT 1) AS most 
FROM main 
WHERE filter IN (1,3,5);

Tested, it's working.

lucif
  • 108
  • 1
  • 6
0

You are returning two columns int he sub query. You just need to return one.

Sample data:

ID_MAIN     FIELD1  FILTER
1   h   1
2   x   2
3   y   3


AUTOINC     ID_MAIN     PATH
11  1   abc
12  2   abd
13  1   xyz
14  1   ghf
15  2   xyz

Try tis out: SQLFIDDLE

Query:

select id_main, 
  field1, 
  (select count(id_main) as cpt 
   from main_logs 
   where main_logs.id_main=main.id_main 
   group by path 
   order by cpt desc limit 1) as CPTs
from main 
where filter in (1,3,5);

Results:

ID_MAIN     FIELD1  CPTs
1       h   1
3       y   (null)

Edit to provide max count per path for each ID

Definitely not the most elegant query. Too may joins and subqueries could result in rather ridiculous performance laggging.

Follow your Sample data, except that table main, id = 3, pink filter = 5. So it complies with your filter criteria. However even without that critieria, the following query seems to work well for the logic.

Query:

select a.id, b.path, a.mx
from
(select x.id, x.path, max(x.ct) as mx
from (
select m.id_main as id, ml.path, 
count(ml.id_main) as ct
from main m
left join 
main_logs ml
on ml.id_main = m.id_main
group by ml.path) as x
group by x.id) as a
inner join 
(select m.id_main as id, m.filter, ml.path, 
count(ml.id_main) as ct
from main m
left join 
main_logs ml
on ml.id_main = m.id_main
group by ml.path) as b
on a.id = b.id
and a.mx = b.ct
where b.filter in (1,2,3)
order by a.mx desc
;

Results:

ID  PATH    MX
1   home2   2
2   house7  2
3   casee   1
bonCodigo
  • 14,268
  • 1
  • 48
  • 91
0
SELECT 
    m.id_main,
    m.field1,
    ml.path,
    IFNULL(ml.Count,0)
FROM main as m
LEFT JOIN (
           SELECT
                 id_main, 
                 path,
                 COUNT(path) as Count
           FROM main_logs
           GROUP BY id_main
          ) as ml on ml.id_main = m.id_main
Muhammad Raheel
  • 19,823
  • 7
  • 67
  • 103
  • same answer here : I think it could works on small dataset, but in my case mysql gone crazy (ERROR 126 (HY000): Incorrect key file for table '/tmp/#sql_11dc_1.MYI'; try to repair it) because the main_log table is huge (5Go). This query need to execute the subselect group by on all the table, without any condition. I can't do that – benfromaix Dec 25 '12 at 17:00