48

I need to determine whether a given string can be interpreted as a number (integer or floating point) in an SQL statement. As in the following:

SELECT AVG(CASE WHEN x ~ '^[0-9]*.?[0-9]*$' THEN x::float ELSE NULL END) FROM test

I found that Postgres' pattern matching could be used for this. And so I adapted the statement given in this place to incorporate floating point numbers. This is my code:

WITH test(x) AS (
    VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),
    ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'))

SELECT x
     , x ~ '^[0-9]*.?[0-9]*$' AS isnumeric
FROM test;

The output:

    x    | isnumeric 
---------+-----------
         | t
 .       | t
 .0      | t
 0.      | t
 0       | t
 1       | t
 123     | t
 123.456 | t
 abc     | f
 1..2    | f
 1.2.3.4 | f
(11 rows)

As you can see, the first two items (the empty string '' and the sole period '.') are misclassified as being a numeric type (which they are not). I can't get any closer to this at the moment. Any help appreciated!


Update Based on this answer (and its comments), I adapted the pattern to:

WITH test(x) AS (
    VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),
    ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'), ('1x234'), ('1.234e-5'))

SELECT x
     , x ~ '^([0-9]+[.]?[0-9]*|[.][0-9]+)$' AS isnumeric
FROM test;

Which gives:

     x    | isnumeric 
----------+-----------
          | f
 .        | f
 .0       | t
 0.       | t
 0        | t
 1        | t
 123      | t
 123.456  | t
 abc      | f
 1..2     | f
 1.2.3.4  | f
 1x234    | f
 1.234e-5 | f
(13 rows)

There are still some issues with the scientific notation and with negative numbers, as I see now.

mvp
  • 111,019
  • 13
  • 122
  • 148
moooeeeep
  • 31,622
  • 22
  • 98
  • 187

5 Answers5

109

As you may noticed, regex-based method is almost impossible to do correctly. For example, your test says that 1.234e-5 is not valid number, when it really is. Also, you missed negative numbers. What if something looks like a number, but when you try to store it it will cause overflow?

Instead, I would recommend to create function that tries to actually cast to NUMERIC (or FLOAT if your task requires it) and returns TRUE or FALSE depending on whether this cast was successful or not.

This code will fully simulate function ISNUMERIC():

CREATE OR REPLACE FUNCTION isnumeric(text) RETURNS BOOLEAN AS $$
DECLARE x NUMERIC;
BEGIN
    x = $1::NUMERIC;
    RETURN TRUE;
EXCEPTION WHEN others THEN
    RETURN FALSE;
END;
$$
STRICT
LANGUAGE plpgsql IMMUTABLE;

Calling this function on your data gets following results:

WITH test(x) AS ( VALUES (''), ('.'), ('.0'), ('0.'), ('0'), ('1'), ('123'),
  ('123.456'), ('abc'), ('1..2'), ('1.2.3.4'), ('1x234'), ('1.234e-5'))
SELECT x, isnumeric(x) FROM test;

    x     | isnumeric
----------+-----------
          | f
 .        | f
 .0       | t
 0.       | t
 0        | t
 1        | t
 123      | t
 123.456  | t
 abc      | f
 1..2     | f
 1.2.3.4  | f
 1x234    | f
 1.234e-5 | t
 (13 rows)

Not only it is more correct and easier to read, it will also work faster if data was actually a number.

steveha
  • 74,789
  • 21
  • 92
  • 117
mvp
  • 111,019
  • 13
  • 122
  • 148
  • 1.234d+5 is a "valid" number, too. I ran into that format doing some data warehouse work a few years ago. It was in the output of an old Fortran program; it represents a double-precision float value. Whatever office software they had imported it correctly. – Mike Sherrill 'Cat Recall' Apr 25 '13 at 04:07
  • 3
    Well, my point is that if you are trying to say if given string stored in Postgres database is a valid number, the only reasonable method is to ask Postgres server itself on what it thinks about it. If it says that `1.234d+5` is not a valid number, then you cannot really cast it into valid number using Postgres means. – mvp Apr 25 '13 at 04:14
  • I'd modify it a little to handle NULLs: `FUNCTION isnumeric(anyelement)` to take any argument. Then returning dynamic value for successful operations: `DECLARE x NUMERIC; results BOOLEAN;` And set that value inside BEGIN block: `results = CASE WHEN $1 IS NULL THEN NULL ELSE TRUE END; x = $1::NUMERIC; RETURN results;` --- This means that isnumeric(NULL) will return NULL, since NULLs have no value. – vol7ron Feb 24 '14 at 21:01
  • 3
    There is easier solution to handle `NULL`s. Leave function body intact, and simply add this line to function declaration: [`RETURNS NULL ON NULL INPUT`](http://www.postgresql.org/docs/9.1/static/sql-createfunction.html) – mvp Feb 24 '14 at 21:34
  • @mvp Yeah, I was thinking there was something like that, but my knowledge of `plpgsql` is fading :) I'm still not sure if that's how NULL input should be treated; that is, NULL in NULL out. I know for my uses it generally doesn't matter (e.g., `CASE WHEN isnumeric()...`) and I'd think a person could easily `coalesece(isnumeric(),false)` if it did; but this function has come in handy – vol7ron Feb 24 '14 at 23:36
  • 2
    After re-reading documentation once again, it turns out that simply adding keyword `STRICT` is equivalent to `RETURNS NULL ON NULL INPUT` and does what you wanted. – mvp Feb 25 '14 at 07:33
  • 3
    I have edited the answer to include the `STRICT` keyword as discussed above. It's a database function; it should do the right thing with `NULL`. I tested it, and without `STRICT` it returns `true` for `NULL` which is never what anyone would want. – steveha Aug 19 '15 at 18:24
  • I think an error shouldn't be used as a control. This answer **causes** to **error** given below: **lock buffer_content is not held.** **PosstgreSQL Version:** "PostgreSQL 10.10 (Ubuntu 10.10-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0, 64-bit" – gokaysatir Jan 15 '20 at 18:01
  • @gokaysatir, are you sure it's not some other part of your query causing locking issues? – mvp Jan 15 '20 at 20:14
  • @mvp yes, indeed this happens with these steps (sorry for bad English): * I have a function index, and the indexed function calls your answer (isnumeric). * When i update a row, PostgreSQL is probably trying to refresh index, and it calls the indexed function too many times, then this error happens. * Then i updated the isnumeric function with a regular expression, then the error is gone. * I copied the regular expression from another answer and tested for my purposes. It works. The regular expression is: '^[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?$' – gokaysatir Jan 15 '20 at 21:13
  • @gokaysatir, consider reporting this bug [here](https://www.postgresql.org/support). This locking problem should not happen inside PLPGSQL function even if using exceptions. – mvp Jan 16 '20 at 05:27
  • The downside of this solution is that exception handing inside the function creates a subtransaction. This puts limit on using it whithing mass insert or update queries. – hovercraft Mar 16 '23 at 05:35
10

You problem is the two 0 or more [0-9] elements on each side of the decimal point. You need to use a logical OR | in the number identification line:

~'^([0-9]+\.?[0-9]*|\.[0-9]+)$'

This will exclude a decimal point alone as a valid number.

msanford
  • 11,803
  • 11
  • 66
  • 93
Mr Rho
  • 578
  • 3
  • 16
  • 4
    You're missing some escapes on your `.`s, that will match both `'1x1'` and `'x1'`. – mu is too short Apr 24 '13 at 16:27
  • Yes, I'm used to Oracle and Java, please ensure your escapes are correct, the | should be the proper POSIX OR operator and the . should be PERIOD, not the POSIX 'all characters' operator. – Mr Rho Apr 24 '13 at 17:05
-3

I suppose one could have that opinion (that it's not a misuse of exception handling), but generally I think that an exception handling mechanism should be used just for that. Testing whether a string contains a number is part of normal processing, and isn't "exceptional".

But you're right about not handling exponents. Here's a second stab at the regular expression (below). The reason I had to pursue a solution that uses a regular expression was that the solution offered as the "correct" solution here will fail when the directive is given to exit when an error is encountered:

SET exit_on_error = true;

We use this often when groups of SQL scripts are run, and when we want to stop immediately if there is any issue/error. When this session directive is given, calling the "correct" version of isnumeric will cause the script to exit immediately, even though there's no "real" exception encountered.

create or replace function isnumeric(text) returns boolean
  immutable
  language plpgsql
as $$
begin
  if $1 is null or rtrim($1)='' then
    return false;
  else
    return (select $1 ~ '^ *[-+]?[0-9]*([.][0-9]+)?[0-9]*(([eE][-+]?)[0-9]+)? *$');
  end if;
end;
$$;
r3mus n0x
  • 5,954
  • 1
  • 13
  • 34
  • You are still missing a point. Your regex will pass `1e`, which is not a number. And, it will pass 1e100000, which may look like a number, but it cannot be stored as one, and `1e100000::NUMERIC` will fail to cast. – mvp Oct 30 '18 at 23:05
  • The regex above will return false for "1e", which is correct. But you're right about really very large or very small exponentiated representations. However, numeric is defined as: up to 131072 digits before the decimal point; up to 16383 digits after the decimal point. So in my context, and I think for many other business contexts, if we're attempting to test a number larger than this, there's a serious problem, and I *want* the program to crash or throw an exception. This might not be true if I were working in astronomy or physics. For my purposes I can't throw one every time it's called. – Scott McIntosh Nov 02 '18 at 16:12
  • 1
    And strictly speaking, a number like 1e100000 *is* numeric. You just can't cast it from a string to a numeric. – Scott McIntosh Nov 02 '18 at 16:25
  • 1
    If you can't cast it to numeric, it is not numeric. – mvp Nov 02 '18 at 17:32
-3

Since PostgreSQL 9.5 (2016) you can just ask the type of a json field:

jsonb_typeof(field)

From the PostgreSQL documentation:

json_typeof(json)
jsonb_typeof(jsonb)

Returns the type of the outermost JSON value as a text string. Possible types are object, array, string, number, boolean, and null.

Example

When aggregating numbers and wanting to ignore strings:

SELECT m.title, SUM(m.body::numeric)
FROM messages as m
WHERE jsonb_typeof(m.body) = 'number'
GROUP BY m.title;

Without WHERE the ::numeric part would crash.

krivar
  • 342
  • 1
  • 3
  • 16
  • That works ok if you know the string you want to test is JSON but doesn't seem to work well with arbitrary strings. For example: compare `select jsonb_typeof('"foo"')` and `select jsonb_typeof('foo');` – jonseymour May 20 '20 at 04:59
  • Unfortunately, this does not really work. It crashes on most inputs, and even on numbers like `1.` – mvp Jun 30 '20 at 19:02
-4

The obvious problem with the accepted solution is that it is an abuse of exception handling. If there's another problem encountered, you'll never know it because you've tossed away the exceptions. Very bad form. A regular expression would be the better way to do this. The regex below seems to behave well.

create function isnumeric(text) returns boolean
    immutable
    language plpgsql
as $$
begin
    if $1 is not null then
     return (select $1 ~ '^(([-+]?[0-9]+(\.[0-9]+)?)|([-+]?\.[0-9]+))$');
    else
     return false;
    end if;
end;
$$
;
  • 3
    It is NOT abuse of exception handling. It only wraps around single operator: `x = $1::NUMERIC;`, which is designed to cast parameter as numeric value. If any problem happens with this cast, it means `$1` was NOT a number, and thats all we care about to get our answer. There is no way to get any other exception here. Your attempt still won't work - it will not parse `1e6`, which is a valid number. – mvp Oct 25 '18 at 01:00
  • Your solution returns doesn't work for scientific notations, e.g. `1e-5` or `1.e5` or `'Nan'` which the accepted solution properly detects as a valid numeric value. –  Jun 10 '21 at 20:27