1

I know Oracle (I am working on Oracle11gR2) can convert data-types into each other implicitly if it is able to do so. For example if I try to insert a number into varchar area, it implicitly converts number into a varchar or vice versa a varchar would be converted into a number if it is a valid number.

Also Oracle does this implicit conversion on joins. If I join a number column to a varchar column, it will implicitly convert the varchar into number and complete the query. But if there is any invalid numbers in the varchar column it would throw an ORA-01722 : Invalid Number You can see this setting by running the codes below:

create table test_1(
  id varchar2(20),
  val number);

create table test_2(
  id number,
  name varchar2(20)
  );

insert into test_1 values ('abc', 10);
insert into test_1 values ('1', 11);
insert into test_2 values (1,'abc');
insert into test_2 values (2,'def');

-- Throw error
select 
  * 
from
 test_1, test_2
where
 test_1.id = test_2.id

-- work
select 
  test_1.id, val, name 
from
 test_1, test_2
where
 test_1.id = test_2.id
 and test_1.id = '1'

You can also see and run the samples on : http://sqlfiddle.com/#!4/fdce3/9/0

Now my question, is there any option or configuration parameter to force Oracle to make this implicit conversion into varchar instead of number? Or exactly see the source of error of invalid number (which column or which join)?

I know that I can explicitly do the conversion to avoid error. Like below, but I do not want that as solution.

select 
  * 
from
 test_1, test_2
where
 test_1.id = to_char(test_2.id)

You can visit also http://sqlfiddle.com/#!4/fdce3/10 to see the code above works properly.

Thanks

Canburak Tümer
  • 993
  • 17
  • 36
  • 1
    I don't see much of a point to the question. Even if you can trick Oracle into doing an implicit conversion the way you want, it would still be doing a conversion and so I wouldn't expect any difference from a performance point of view. Also, don't use implicit join syntax; it's obsolete and makes your query hard to read. – Tim Biegeleisen Jan 13 '18 at 08:24
  • 2
    Proper solution would be: don't use VARCHAR2 data types when you like to store numbers! – Wernfried Domscheit Jan 13 '18 at 08:58
  • @WernfriedDomscheit - Amen to that! – APC Jan 13 '18 at 09:01
  • 2
    @TimBiegeleisen - The question has an obvious point: the query that fails at execution time throws an error, but it doesn't say on which column of which table it failed. If the query is complicated, it involves many tables and many columns, one could change all conditions with `TO_CHAR()`, but it will take very long and it will need a lot of testing. The OP wants to find which table and column have invalid data, not change the query. –  Jan 13 '18 at 14:53
  • Thanks for the comments! TimBiegeleisen "don't use implicit join" unfortunately the ETL tool we are using creates these joins automatically, I do not have control over it. My problem is not about the performance but the crashing code, because of the source system put some non-numeric value into the ID column. I am aware that if we need to store numeric then column type should not be VARCHAR as Wernfried Domscheit says but I do not have control over source systems' data model. – Canburak Tümer Jan 14 '18 at 09:54

2 Answers2

4

" can explicitly do the conversion to avoid error ... I do not want that as solution".

So you don't want to use good practice? Why on earth not? Although, if you're comparing a numeric column to a string column perhaps that horse has already bolted.

Anyhoo. The ORA-01722 is the database telling us some useful information: it's telling us 'you were expecting test1.id to be numeric but you should know it contains non-numeric values'. Now we are in a position to resolve the error.

First off, is this right? Do we expect test1.id to contain non-numeric values? If the answer is 'no' then we have a data quality issue (not to mention a data modelling issue) and we should raise a bug.

But if we know test1.id can legitimately contain non-numeric strings then we have to write our query accordingly. Which means we need to apply to_char() on the other side of the WHERE clause. This not only handles the error it waves a little flag to some future co-worker looking at the query: 'by the way, test1.id contains non-numeric values: crazy, huh?'

"is there any option or configuration parameter to force Oracle to make this implicit conversion into varchar instead of number?"

Suppressing exceptions is always a bad idea. We need to know when something is amiss so we can handle it properly.

"is there any option or configuration parameter to ... exactly see the source of error of invalid number (which column or which join)?"

Alas no. To be fair, it's not unreasonable to expect that you are familiar with the data model you're working with. But if you don't know which columns are causing the problem there isn't an alternative to going through the data dictionary (i.e. all_tab_columns).

If you want to find out which rows in your string column contain non-numeric values then you will need to query it. In Oracle 12cR2 there's the highly neat VALIDATE_CONVERSION() function. Find out more. If you're using an earlier version then you need to write your own, such as this function in another StackOverflow thread.


"I am working in an ETL team and I do not have any control over source data model."

Indeed, one of biggest problems with ETL is handling bad data from the source system. There are various approaches, such as loading the data into staging tables, validating it, then either publishing it to final tables or quarantined tables depending on its quality. Alternatively we can use DML error logging; that at least will isolate the records which throw the exceptions. Find out more.

APC
  • 144,005
  • 19
  • 170
  • 281
  • @mathguy - yeah you're right. That was a proper senior moment :) – APC Jan 13 '18 at 15:04
  • 1
    Thanks @APC that was a nice answer. The problem with my query is, I am working in an ETL team and I do not have any control over source data model. (Probably I will raise a bug on Monday about that DQ) They put some non-numeric data into ID column, which caused a legacy code (which is 10 years old, has more than 50 columns, 20 source tables) to crash. Since it's an old and complex query I don't want to modify it. As I've already stated we are using 11gR2, but planning to upgrade 12cR2 then I will try to utilize VALIDATE_CONVERSION() function. Thanks a lot again. – Canburak Tümer Jan 14 '18 at 09:49
-1

You don't need any special tricks, just join on only those id's that are "numbers" by matching using a regex for "only digits":

select 
  test_1.id, val, name 
from
 test_1, test_2
where
 test_1.id = test_2.id
 and regexp_like (test_1.id, '^[0-9]+$')

See regexp_like documentation.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • @Ben fixed. (I'm more used to mysql) – Bohemian Jan 13 '18 at 08:46
  • Actually, we need to use `JOIN..ON` syntax – Kaushik Nayak Jan 13 '18 at 08:47
  • 2
    This is a bad test. It would reject decimals and negatives as non-numeric. Obviously you could add those into your pattern but why bother? There are better (more performative) ways of testing whether a string can be cast to a number in Oracle, which I touch on in my response. – APC Jan 13 '18 at 08:48
  • 1
    @KaushikNayak no, we don't need `join` (although it's a good idea) because OP's query is an inner join. I did the minimum to OP's query to get it working. Improving code style does't help answer the actual question. – Bohemian Jan 13 '18 at 08:49
  • @APC It's a great test. `id` values are going to be non-negative integers. If in some weird universe ids could be both positive and negative decials, it would only require a tweak to the regex to `'^-?[0-9]+(\.[0-9]*)?'`... big deal. – Bohemian Jan 13 '18 at 08:52
  • @Ben I'm using implicit join because that is what OP uses. I wanted to make the minimum change to the qery to get it working – Bohemian Jan 13 '18 at 08:53
  • I'm worked on systems which use negative IDs (say to indicate *unknown*). Although they all used number columns to store IDs so data conversion was never a issue. However, my main point is that using regex to identify numerics is trickier in some situations. Anyway we all know that using regex to solve a problem in production code means [now we've got two problems](https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/) – APC Jan 13 '18 at 08:55
  • @apc if negative ids indicate "unknown" then the regex not matching them is a good thing. – Bohemian Jan 13 '18 at 12:51