2

I am writing a solution for a user that matches a list of phone numbers they enter against a customer database.

The user needs to enter a comma separated list of phone numbers (integers), and the query needs to tell the user which phone numbers from their list are NOT in the database.

The only way I could think to do this is by first creating a subset NUMBER_LIST that includes all of the phone numbers that I can join and then exclude that list from what I bring back from my customer database.

WITH NUMBER_LIST AS (
    SELECT INTEGERS 
    FROM (
        SELECT level - 1 + 8000000000 INTEGERS
           FROM dual
        CONNECT BY level <= 8009999999-8000000000+1
    )
    WHERE INTEGERS IN (8001231001,8001231003,8001231234,8001231235,...up to 1000 phone numbers)
)

The problem here is the above code works fine to create my subset, for numbers between 800-000-0000 and 800-999-9999. The phone numbers in my list and customer database can be ANY range (not just 800 numbers). I did this just as a test. It takes about 6 seconds to generate the subset from that query. If I create the CONNECT BY LEVEL to include all numbers from 100-000-0000 to 999-999-9999 that is running my query out of memory to create a subset that large (and I believe it is ridiculously overkill to create a huge list and break it down using my IN statement).

The problem is creating the initial subset. I can handle the rest of the query, but I need to be able to generate the subset of numbers to query against my customer database from my IN statement.

Few things to remember:

  1. I don't have the ability to load the numbers in a temporary table first. The user will be entering the "IN(...,...,...)" statement themselves.
  2. This needs to be a single statement, no extra functions or variable declarations
  3. The database is Oracle 10g, and I am using SQL Developer to create the query.
  4. The user understands that they can only enter 1000 numbers into the IN statement. This needs to be robust enough to select any 1000 numbers from the entire area code range.
  5. The end result is to get a list of phone numbers that ARE NOT in the database. A simple NOT IN... will not work, because that will bring back which numbers are in the database, but not in my list.

How can I make this work for all numbers between 1000000000-9999999999 (or all U.S. 10-digit phone number possibilities). I may be going about it completely wrong to generate my initial HUGE list and then excluding everything other than my IN statement, but I'm not sure where to go from here.

Thanks so much for your help in advance. I've learned so much from all of you.

muncherelli
  • 2,887
  • 8
  • 39
  • 54
  • You could try adding a between which would limit the range the in must look through. WHERE INTEGERS BETWEEN lowest_number and highest_number – Michael Eakins Aug 02 '11 at 14:33

4 Answers4

3

You could use the following:

SELECT *
  FROM (SELECT regexp_substr(&x, '[^,]+', 1, LEVEL) phone_number
           FROM dual
         CONNECT BY LEVEL <= length(&x) - length(REPLACE(&x, ',', '')) + 1)
 WHERE phone_number NOT IN (SELECT phone_table.phone_number 
                              FROM phone_table)

The first query will build a list with the individual phone numbers.

Vincent Malgrat
  • 66,725
  • 9
  • 119
  • 171
  • Nice use of regexes to simplify my query - I should have thought of that, but I just copied and pasted from my old answer and was too lazy to try and simplify it further. Assuming the tel_table contains many more numbers than the numbers passed in to check, it will probably be better using a NOT EXISTS to check the numbers via the index instead of the MINUS operation, but the real trick to this is splitting the comma separated list into rows to join against. – Stephen ODonnell Aug 02 '11 at 14:51
  • @Stephen: I agree: NOT IN or NOT EXISTS should produce better plans than MINUS – Vincent Malgrat Aug 02 '11 at 15:19
2

This problem is very closely related to the 'how do I bind an in list' problem, which has come up on here a few times. I posted an answer Dynamic query with HibernateCritera API & Oracle - performance in the past.

Something like this should do what you want:

create table phone_nums (phone varchar2(10));

insert into phone_nums values ('12345');

insert into phone_nums values ('23456');

with bound_inlist
  as
  (
  select
    substr(txt,
           instr (txt, ',', 1, level  ) + 1,
           instr (txt, ',', 1, level+1) - instr (txt, ',', 1, level) -1 )
           as token
    from (select ','||:txt||',' txt from dual)
  connect by level <= length(:txt)-length(replace(:txt,',',''))+1
  )
  select *
from bound_inlist a
where  not exists (select null from phone_nums where phone = token); 

Here the list of comma separated phone numbers is bound into the query, so you are using bind variables correctly, and you will be able to enter probably an unlimited number of phone numbers to check in one go (although I would check both the 4000 and 32767 character boundaries to be sure).

Community
  • 1
  • 1
Stephen ODonnell
  • 4,441
  • 17
  • 19
1

If you are constrained to the point that it MUST be solved with IN (n1,n2,n3,...,n1000), then your approach would appear to be the only solution.

As you mentioned though, that's a big list you're creating up front.

Are you able to adapt your approach slightly?

WITH NUMBER_LIST (number) AS (
            SELECT n1    FROM DUAL
  UNION ALL SELECT n2    FROM DUAL
  UNION ALL SELECT n3    FROM DUAL
  ...
  UNION ALL SELECT n1000 FROM DUAL
)
mwigdahl
  • 16,268
  • 7
  • 50
  • 64
MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • Yeah, I can't think of a smarter way around this. Feels rather like we've painted ourselves into a corner, though. My only thoughts: 1) is it actually practical just to shove all those numbers into a table once, up front? Disk is cheap, and it'll be faster. 2) Depending on the ranges of numbers, is there a way of limiting yourself to generating the numbers between the minimum number provided and the maximum number provided, to speed things up when that's less than the whole list? I'm guessing that subset will be significantly smaller. – Matt Gibson Aug 02 '11 at 14:31
  • Ten Billion records is still alot, even for a narrow (1 column) lookup table. That said, it's better than creating a billion records every time. As for generating the list in a more intellegent way, you'd need to describe your environment more; How is the final query being built? If it's in another language, what's stopping you building the 1000 item UNION? Or executing a prepared query upto 1000 times and aggregating the results in the client? etc, etc? – MatBailie Aug 02 '11 at 14:35
  • 1
    Yes; this whole problem gives me the feeling that a nice big step backwards is needed, rather than just carrying on in the current direction. "Assuming I have no alternative but to cut my leg off, which saw would be best?" – Matt Gibson Aug 02 '11 at 14:40
1

You say you can't use temp tables or procs or custom functions -- it would be a simple task if you could.

What's the client tool being used to submit this query? Is there a reason why you can't query all phone numbers from the database and do the compare on the client?

Bob Probst
  • 9,533
  • 8
  • 32
  • 41