7

I have a Postgresql 9.3 database that is encoded 'UTF8'. However, there is a column in database that should never contain anything but ASCII. And if non-ascii gets in there, it causes a problem in another system that I have no control over. Therefore, I want to add a constraint to the column. Note: I already have a BEFORE INSERT trigger - so that might be a good place to do the check.

What's the best way to accomplish this in PostgreSQL?

David S
  • 12,967
  • 12
  • 55
  • 93

2 Answers2

10

You can define ASCII as ordinal 1 to 127 for this purpose, so the following query will identify a string with "non-ascii" values:

SELECT exists(SELECT 1 from regexp_split_to_table('abcdéfg','') x where ascii(x) not between 1 and 127);

but it's not likely to be super-efficient, and the use of subqueries would force you to do it in a trigger rather than a CHECK constraint.

Instead I'd use a regular expression. If you want all printable characters then you can use a range in a check constraint, like:

CHECK (my_column ~ '^[ -~]*$')

this will match everything from the space to the tilde, which is the printable ASCII range.

If you want all ASCII, printable and nonprintable, you can use byte escapes:

CHECK (my_column ~ '^[\x00-\x7F]*$')

The most strictly correct approach would be to convert_to(my_string, 'ascii') and let an exception be raised if it fails ... but PostgreSQL doesn't offer an ascii (i.e. 7-bit) encoding, so that approach isn't possible.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
5

Use a CHECK constraint built around a regular expression.

Assuming that you mean a certain column should never contain anything but the lowercase letters from a to z, the uppercase letters from A to Z, and the numbers 0 through 9, something like this should work.

alter table your_table
add constraint allow_ascii_only
check (your_column ~ '^[a-zA-Z0-9]+$');

This is what people usually mean when they talk about "only ASCII" with respect to database columns, but ASCII also includes glyphs for punctuation, arithmetic operators, etc. Characters you want to allow go between the square brackets.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185