3

Currently i have this:

insert into temp select  * from myTable where (called_phone in
(
    select number1 from
    (
        SELECT *  FROM
        (
            SELECT called_phone as number1, count(*) as conto
            FROM myTable
            GROUP BY called_phone

        ) AS subquery
        union 
        SELECT * FROM
        (
            SELECT calling_phone as number1, count(*) as conto
            FROM myTable
            GROUP BY calling_phone

        ) AS subquery1
    )as subquery3
    GROUP BY number1
    having sum(conto) > 4000
))
or
(calling_phone in 
    (
        select number1 from
        (
            SELECT *  FROM
            (
                SELECT called_phone as number1, count(*) as conto
                FROM myTable
                GROUP BY called_phone

            ) AS subquery
            union 
            SELECT * FROM
            (
                SELECT calling_phone as number1, count(*) as conto
                FROM myTable
                GROUP BY calling_phone

            ) AS subquery1
        )as subquery3
        GROUP BY number1
        having sum(conto) > 4000
    )
);

i have 2 columns (called and calling phone) in which i must check all the occurrency for each number, and store in another table, all the numbers that have the count(*) > 4000 summing the occurence in both the columns. Problem with this query is that i'm doing 2 times the subquery which, by itself scan 2 times MyTable. I was thinking to store the subquery in a temporary table, and then scan it. Is this the best approach? What would you suggest?

EDIT : i'm using MySQL 5.7 with MyISAM as engine

EDIT2 : tried this:

create table test (`number1` VARCHAR(255) not NULL, primary key (`number1`));

insert into test(number1) select number1 from 
    (
        SELECT *  FROM
        (
            SELECT called_phone as number1, count(*) as conto
            FROM myTable
            GROUP BY called_phone

        ) AS subquery
        union 
        SELECT * FROM
        (
            SELECT calling_phone as number1, count(*) as conto
            FROM myTable
            GROUP BY calling_phone

        ) AS subquery1
    )as subquery3
    GROUP BY number1
    having sum(conto) > 4000;



insert into temp select  * from myTable where (called_phone in
(
    select number1 from test
    ))
or
(calling_phone in 
    (
        select number1 from test
));

drop table test;

But this is much more slower (at least on my test data, which is a table of ~14 records)

  • first approach takes 350ms to 380 ms
  • second approach take 800ms to 1.8s

FINAL EDIT : I'm writing the query that is producing the best results, and it comes from the answer of @Daniel E.

insert into temp
SELECT t1.* myTable t1 
INNER JOIN 
 (
   select number1 from
        (
            SELECT *  FROM
            (
                SELECT called_phone as number1, count(*) as conto
                FROM myTable
                GROUP BY called_phone

            ) AS subquery
            union 
            SELECT * FROM
            (
                SELECT calling_phone as number1, count(*) as conto
                FROM myTable
                GROUP BY calling_phone

            ) AS subquery1
        )as subquery3
        GROUP BY number1
        having sum(conto) > 4000
 ) t2 ON (t2.number1 = t1.called_phone
        OR 
         t2.number1 = t1.calling_phone)
  • first approach takes 350ms to 380 ms
  • second approach take 800ms to 1.8s
  • last approach take 315 to 335 ms
Daniele Sartori
  • 1,674
  • 22
  • 38

3 Answers3

1

As you seems to understand the Insert Into I will just rewrite the select, instead of using a IN and a OR, I use 1 inner join :

SELECT t1.* FROM myTable t1 
INNER JOIN 
 (
   select number1 from
        (
            SELECT *  FROM
            (
                SELECT called_phone as number1, count(*) as conto
                FROM myTable
                GROUP BY called_phone

            ) AS subquery
            union 
            SELECT * FROM
            (
                SELECT calling_phone as number1, count(*) as conto
                FROM myTable
                GROUP BY calling_phone

            ) AS subquery1
        )as subquery3
        GROUP BY number1
        having sum(conto) > 4000
 ) t2 ON (t2.number1 = t1.called_phone
        OR 
         t2.number1 = t1.calling_phone)
Daniel E.
  • 2,440
  • 1
  • 14
  • 24
  • When i combine both Inner join from your answer i get an empty table – Daniele Sartori Feb 26 '18 at 10:33
  • i think it's because (but i might be wrong) that you will never find an intersection between the 2 join. The number will be present in only 1 column every time, it can't be in both. What i need to do is to get all the numbers that have more then 4000 (this number is not really important at the moment) occurrence summing all the occurrence in both columns – Daniele Sartori Feb 26 '18 at 11:12
  • It works, but seems that my original query is still the fastest approach. You query take approximately 520 ms while mine 370 ms. I guess that i can't do much better then what i've already done – Daniele Sartori Feb 26 '18 at 13:17
  • I found that if you remove distinct is better. I'm marking your answer as the accepted one, but you should modify it – Daniele Sartori Feb 26 '18 at 13:22
  • A "or" in a Join is quite expensive, that's why I am not satisfy with my answer but I don't know what we can do. Not sure you should accept my answer. – Daniel E. Feb 26 '18 at 16:42
  • well it's the best answer so far, slightly better then my current query. I will have too see how it performs on huge tables – Daniele Sartori Feb 26 '18 at 16:44
  • All your test are with SQL_no_cache ? – Daniel E. Feb 26 '18 at 16:45
  • no i was testing them on DBeaver like i would implement them in my code – Daniele Sartori Feb 26 '18 at 16:50
  • Add it in the query to test all solutions : SELECT SQL_NO_CACHE .... because if you don't mysql may use some cache and the time won't be relevant – Daniel E. Feb 26 '18 at 16:55
  • thank you for your suggestion. I have now enough elements to proceed in my tests. I have builted 3 different query, and i will test them as soon as possible – Daniele Sartori Feb 26 '18 at 16:58
0

You could use the WITH AS ... strategy which is often used in recursive statements.

(Please forgive me for wrong intendation or braces)

       with numbersRelation as (
            select number1 from
            (
                SELECT *  FROM
                (
                    SELECT called_phone as number1, count(*) as conto
                    FROM myTable
                    GROUP BY called_phone

                ) AS subquery
                union 
                SELECT * FROM
                (
                    SELECT calling_phone as number1, count(*) as conto
                    FROM myTable
                    GROUP BY calling_phone

                ) AS subquery1
            ) as subquery3
            GROUP BY number1
            having sum(conto) > 4000
        )
        insert into temp select  * from myTable where (called_phone in numbersRelation or
        calling_phone in numbersRelation)
emmics
  • 994
  • 1
  • 11
  • 29
-1

try:

insert into temp 
select  * from myTable where called_phone in (
    select number1 from (
        SELECT called_phone as number1
        FROM myTable
        union all
        SELECT calling_phone as number1
        FROM myTable ) subquery
    GROUP BY number1
    having count(number1) > 4000 )

union 

select  * from myTable where calling_phone in (
    select number1 from (
        SELECT called_phone as number1
        FROM myTable
        union all
        SELECT calling_phone as number1
        FROM myTable ) subquery
    GROUP BY number1
    having count(number1) > 4000 )

or maybe faster will be:

insert into tabletemp 
    select number1 from (
        SELECT called_phone as number1
        FROM myTable
        union all
        SELECT calling_phone as number1
        FROM myTable ) subquery
    GROUP BY number1
    having count(number1) > 4000 ) ;

insert into temp 
select  * 
from myTable as mt 
join tabletemp tbt on (mt.called_phone = tbt.number1 or mt.calling_phone = tbt.number1);

drop table tabletemp;
areklipno
  • 538
  • 5
  • 11