Test column for special characters or only characters / numbers

Question

I tried finding special characters using generic regex attributes and NOT LIKE clause but have been getting confusing results. The research suggested that it does not work the way it works in SQL Server or elsewhere.

For finding if there is any character
For finding if there is any number
For finding if there is any special character

like '%[^0-9]%' or '%[^a-Z]%' does not work very well when finding if non-numeric data is available and if non-alphabetical data is present, respectively

SELECT column1 from some_table where column1 like '%[^0-9]%'; 
SELECT column1 from some_table where column1 like '%[^a-Z]%' 
SELECT column1 from some_table where column1 like '%[^a-Z0-9]%'

Have also noted that people use -> NOT like '%[^0-9]%'

`LIKE` does not support regular expressions in SQL – May 07 '19 at 20:43 — , May 07 '19 at 20:43

Erwin Brandstetter · Accepted Answer · 2023-08-01T21:58:32.340

Postgres LIKE does not support regular expressions.
You need the regular expression operator ~.

Standard SQL also defines SIMILAR TO as an odd mix of the above, but rather don't use that. See:

Pattern matching with LIKE, SIMILAR TO or regular expressions

For finding if there is any character

... meaning any character at all:

... WHERE col <> '';                        -- any character at all?

So neither NULL nor empty. See:

Best way to check for "empty or null value"

... meaning any alphabetic character (letter):

... WHERE col ~ '[[:alpha:]]';              -- any letters?

[[:alpha:]] is the character class for all alphabetic characters - not just the ASCII letters [A-Za-z], includes letters like [ÄéÒçòý] etc.

For finding if there is any number

... meaning any digit:

... WHERE col ~ '\d';                       -- any digits?

\d is the class shorthand for [[:digit:]].

For finding if there is any special character

... meaning anything except digits and letters:

... WHERE col ~ '\W';                       -- anything but digits & letters?

\W is the class shorthand for [^[:alnum:]_] (underscore excluded - the manual is currently confusing there).

... meaning anything except digits, letters and plain space:

... WHERE col ~ '[^[:alnum:]_ ]'            -- ... and space

That's the class shorthand \W spelled out, additionally excluding plain space.

... meaning anything except digits, letters and any white space:

... WHERE col ~ '[^[:alnum:]_\s]'           -- ... and any white space
... WHERE col ~ '[^[:alnum:]_[:space:]]'    -- ... the same spelled out

This time excluding all white space as defined by the Posix character class space. About "white space" in Unicode:

Trim trailing spaces with PostgreSQL

... meaning any non-ASCII character:

If your DB cluster runs with UTF8 encoding, there is a simple, very fast hack:

... WHERE octet_length(col) > length(col);  -- any non-ASCII letter?

octet_length()counts the bytes in the string, while length() (aliases: character_length() or char_length()) counts characters in the string. All basic ASCII characters ([\x00-\x7F]) are encoded with 1 byte in UTF-8, all other characters use 2 - 4 bytes. Any non-ASCII character in the string makes the expression true.

Further reading:

Chapter Regular Expression Class-shorthand Escapes in the manual.
PostgreSQL 9.1 using collate in select statements
ERROR: “sql ” is not a known variable

Worked like a charm! Thank you so much for putting the effort. ... WHERE column1 ~ '\W'; worked but it also shows records with spaces. For that, I used the AND like '% %'; (there is between '% %'). This eliminates records which have spaces and gives me results of records which has special characters — Vish_er, May 08 '19 at 20:59

Gordon Linoff · Answer 2 · 2019-05-08T12:43:09.660

1

The problem is that you are using LIKE incorrectly. These patterns are not recognized by LIKE.

Use ~ for regular expression matching:

select column1 from some_table where column1 ~ '[^a-Z0-9]'

or more aptly:

select column1 from some_table where column1 ~ '[^a-zA-Z0-9]'

This will return any column that has a character not specified in the character class.

Here is a db<>fiddle.

edited May 08 '19 at 12:43

answered May 07 '19 at 20:41

Gordon Linoff

1,242,037
58
646
786

This does not work for me. I tried ~ '[^a-Z]' to find out if I can get non-character values but it just gave me names of the cities which did not start with 'a' or 'Z'. Any thoughts? – Vish_er May 08 '19 at 12:23
@VishwalShah . . . The regular expression matches a city name that has any non-letter anywhere in the name. I would assume that it just happens to return city names that start with different characters first. – Gordon Linoff May 08 '19 at 12:26
It should return non-letter based records but it is just showing clean letter-based records. Your assumption might be right but it still does not show me any non-letter based records – Vish_er May 08 '19 at 12:34

Test column for special characters or only characters / numbers

2 Answers2

Linked