2

I am attempting to query a rather large IP/City (3,000,000+ rows) table based on an IP address. My source IP address is dotted notation such as 127.0.0.1 and the table has two fields stored as integers such as 2130706433. So I need to convert the dotted notation to the integer in my where clause like this;

select get_ip_integer('74.253.103.98') ,icb.*,icl.* 
from ip_city_block icb, ip_city_location icl 
where get_ip_integer('74.253.103.98') between  icb.startipnum and  icb.endipnum  and
      icl.locid = icb.locid;

This query takes over 4 ~(4.33) seconds on a reasonably fast DB. The following query takes .062 seconds;

select 1258121058,icb.*,icl.* 
from ip_city_block icb, ip_city_location icl 
where icb.startipnum <= 1258121058 and  icb.endipnum >= 1258121058 and
      icl.locid = icb.locid;

The only difference is I replace the function get_ip_integer with the value that the function returns. If all I were doing is one lookup I would execute the second query and be done with it, but I'm not.

I actually want to join to another table that contains many ip addresses in dotted format, and when I do it takes forever. For fun I also tried;

select ip_integer ,icb.*,icl.* 
from (select get_ip_integer('74.253.103.98') ip_integer from dual),ip_city_block icb, ip_city_location icl 
where icb.startipnum <= ip_integer and  icb.endipnum >= ip_integer and
      icl.locid = icb.locid;

and this took ~4.33 seconds as well.

So the question is how can I force the get_ip_integer function to execute only once and use the results for the comparison?

I updated my function to Deterministic, and it seemed to help on the original query, but the more complex query is still unusable, performance wise. Here it is;

SELECT COUNTRY, REGION,CITY, WEBLOG_USERID, WEBLOG_IP, WEBLOG_COUNT 
FROM (
  select WEBLOG_USERID,WEBLOG_IP,get_ip_integer(WEBLOG_IP) ip_integer,count(*) WEBLOG_COUNT 
  from weblog 
  where weblog_date > '20130217' 
  group by WEBLOG_USERID,weblog_ip
),ip_city_block icb, ip_city_location icl 
where ip_integer between icb.startipnum and icb.endipnum and icl.locid = icb.locid
ORDER BY 1,2,3;

Any thoughts on this?

After a little thought on my own I came up with this, while not blindingly fast it is acceptable;

SELECT COUNTRY, REGION,CITY, WEBLOG_USERID, WEBLOG_IP, WEBLOG_COUNT 
FROM (
  select WEBLOG_USERID,WEBLOG_IP, count(*) WEBLOG_COUNT 
  from weblog 
  where weblog_date > '20130000' 
  group by WEBLOG_USERID,weblog_ip
),ip_city_block icb, ip_city_location icl 
where get_ip_integer(WEBLOG_IP) between icb.startipnum and icb.endipnum and icl.locid = icb.locid
ORDER BY 1,2,3;
Paul Stearns
  • 856
  • 9
  • 30
  • Highly related / possible dupe: http://stackoverflow.com/q/7270467/458741, basically sub-query caching, which you seem to have tried already... – Ben Feb 19 '13 at 18:20
  • What is the plan with and without the function? By default, the optimizer assumes that a condition using a custom function will return 1/20th of the rows. Whereas the optimizer may be able to make a better estimate with a hard-coded value, either by using a histogram or just the distinctness of the column. Compare the explain plans, specifically looking at the cardinality and the access paths. I would be that one plan has a nested loop and the other has a hash join. – Jon Heller Feb 19 '13 at 19:32

2 Answers2

2

Why are you using PL/SQL for this at all? From what you've said you're doing some maths, why not just do that in SQL? This'll also be possible with a combination of INSTR and SUBSTR but it's prettier to look at with REGEXP_SUBSTR.

select to_number(regexp_substr(ip, '[^.]+', 1, 1)) * power(2,24)
        + to_number(regexp_substr(ip, '[^.]+', 1, 2)) * power(2,16)
        + to_number(regexp_substr(ip, '[^.]+', 1, 3)) * power(2,8)
        + to_number(regexp_substr(ip, '[^.]+', 1, 4))
     , icb.*
     , icl.* 
  from ip_city_block icb
  join ip_city_location icl
    on icl.locid = icb.locid  
 where to_number(regexp_substr(ip, '[^.]+', 1, 1)) * power(2,24)
        + to_number(regexp_substr(ip, '[^.]+', 1, 2)) * power(2,16)
        + to_number(regexp_substr(ip, '[^.]+', 1, 3)) * power(2,8)
        + to_number(regexp_substr(ip, '[^.]+', 1, 4))
       between icb.startipnum and icb.endipnum

SQL Fiddle demonstration of REGEXP_SUBSTR output

If you have to do this in PL/SQL you should do two things:

  1. See if you can declare your function as deterministic.
  2. Try and take advantage of sub-query caching.

It appears as though you're already doing 2, but you could try and extend this by using a WITH clause:

with the_ip as ( select get_ip_integer('74.253.103.98') as ip from dual )
select the_ip.ip
     , icb.*
     , icl.* 
  from ip_city_block icb
  join ip_city_location icl
    on icl.locid = icb.locid
  join the_ip
    on the_ip.ip between icb.startipnum and icb.endipnum
Ben
  • 51,770
  • 36
  • 127
  • 149
  • Unfortunately, the data is stored as an integer computed; a*2^24 + b*2^16 + c*2^8+d, where a.b.c.d would be the ip address. Using your approach the string 11111111 could be interpreted as 1.111.1.111 or 11.11.11.11 or... – Paul Stearns Feb 19 '13 at 18:27
  • Ben, I took your query, and ran it as is, and it executes in ~4.275 seconds. – Paul Stearns Feb 19 '13 at 18:51
  • @Paul, I've updated my answer with a method of doing it in SQL rather than PL/SQL. On the 4.275s run did you declare your function as deterministic? Can it be? – Ben Feb 19 '13 at 19:14
  • I Have changed the fuction to deterministc, and it improved the performance to equal the value. However the more complex query still runs like a slug. I will update my original question with the more complex query. – Paul Stearns Feb 19 '13 at 19:37
  • @PaulStearns, there's no need to accept this answer unless it actually solved your problem. Did it? If you're question isn't solved then please wait for someone else who might be able to help. I notice you're not technically using sub-query caching in your more complicated query and have an ORDER BY. Is this necessary as you can gain speed improvements without it. – Ben Feb 19 '13 at 19:49
  • The answer I checked actually made the query usable. I was unaware of the "deterministic" attribute, and now that I know about it, I am reviewing all the other places it might be useful. So while it did not technically answer the question in the subject, it gave me what I needed and then some. – Paul Stearns Feb 19 '13 at 22:16
2

you were on the right track with dual, but for subquery caching, you do it in the select.

select (select get_ip_integer('74.253.103.98') from dual) ip,
       icb.*,icl.* 
from ip_city_block icb, ip_city_location icl 
where get_ip_integer('74.253.103.98') between  icb.startipnum and  icb.endipnum  and
      icl.locid = icb.locid;

also you should define your function with result_cache.

see here for more details: http://www.oracle.com/technetwork/issue-archive/2011/11-sep/o51asktom-453438.html

DazzaL
  • 21,638
  • 3
  • 49
  • 57
  • DazzaL, the result_cache bit cut the execution time to ~1.51 Seconds. Much better, but not down to 0.062 Seconds when I use the value. – Paul Stearns Feb 19 '13 at 19:01