I want to store URLs in a column. According to RFC 3986, US-ASCII is the character set from which URLs are composed.
SQL Server has the VARCHAR type, which can encode all the characters from the US-ASCII character set, and 128 more that are dependent on the code page.
I want to use a CHECK constraint to ensure the values in the column contains only the printable characters from the US-ASCII character set; in other words, ASCII(@char) >= 32 AND ASCII(@char) < 127
for every character in the string.
I think I can use a LIKE expression to do this in a check constraint, but I can't find the right pattern. I'm trying to adapt Itzik Ben-Gan's trick of matching any character outside the allowed range, which he presents in his article Can I convert this string to an integer?.
In my test harness I create a table @TestData
of candidates to insert into my column, a table @Patterns
of patterns to be used with the LIKE operator, and then I select the result of matching each pattern against each candidate:
DECLARE @TestData TABLE (
String VARCHAR(60) COLLATE Latin1_General_CI_AS NOT NULL
);
INSERT INTO @TestData(String)
VALUES
('€ÿ'),
('ab3'),
('http://www.google.com/'),
('http://www.example.com/düsseldorf?neighbourhood=Lörick'),
('1234');
DECLARE @Patterns TABLE (
Pattern VARCHAR(12) COLLATE Latin1_General_CI_AS NOT NULL
);
INSERT INTO @Patterns (Pattern)
VALUES
('%[^0-9]%'),
('%[^' + CHAR(32) + '-' + CHAR(126) + ']%');
SELECT
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ID,
String,
Pattern,
CASE WHEN String NOT LIKE Pattern THEN 1 ELSE 0 END AS [Match]
FROM @TestData CROSS JOIN @Patterns;
The first row inserted into @Patterns
is like the pattern Itzik uses to match non-digit characters. The second row is my attempt to adapt this for characters outside the range of printable US-ASCII characters.
When I execute the above batch, I receive the following result set:
ID String Pattern Match
--- -------------------------------------------------------- ------------ ------
1 €ÿ %[^0-9]% 0
2 ab3 %[^0-9]% 0
3 http://www.google.com/ %[^0-9]% 0
4 http://www.example.com/düsseldorf?neighbourhood=Lörick %[^0-9]% 0
5 1234 %[^0-9]% 1
6 €ÿ %[^ -~]% 0
7 ab3 %[^ -~]% 0
8 http://www.google.com/ %[^ -~]% 0
9 http://www.example.com/düsseldorf?neighbourhood=Lörick %[^ -~]% 0
10 1234 %[^ -~]% 0
As expected, row 5 is a match because the candidate contains only digits. The candidates in rows 1 thru 4 do not contain only digits, so do not match the pattern.
As expected, the candidate in row 6 does not match the pattern because it contains 'high ASCII' characters.
I expect the candidates in rows 7, 8, and 10 to match because they contain only printable US-ASCII characters. But these do not match.
What is wrong with the pattern in the LIKE expression?