1

I am using SQL Server 2008, and try to sanitize a list of urls.

Some existing examples of texts:

www.google.com
'www.google.com'
/www.google.com
www.google.com/

Ideally I can strip any leading/trailing non-alphanumeric characters so the four would give out the same output as

www.google.com
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Xiaoyu Lu
  • 3,280
  • 1
  • 22
  • 34
  • Possible duplicate of [T-SQL trim &nbsp (and other non-alphanumeric characters)](https://stackoverflow.com/questions/52315/t-sql-trim-nbsp-and-other-non-alphanumeric-characters)? – Thom A Feb 06 '19 at 17:15
  • will there only ever be a single non alpha-numeric value, or could there be multiple? – S3S Feb 06 '19 at 17:23

3 Answers3

2

Well, if you know they are only at the beginning and end, you can do:

with t as (
      select *
      from (values ('www.google.com'), ('''www.google.com'''), ('/www.google.com')) v(text)
     )
select t.text, v2.text2
from t cross apply
     (values (stuff(t.text, 1, patindex('%[a-zA-Z0-9]%', t.text) - 1, ''))
     ) v(text1) cross apply
     (values (case when v.text1 like '%[^a-zA-Z0-9]'
                   then stuff(v.text1, len(text) + 1 - patindex('%[a-zA-Z0-9]%', reverse(v.text1)), len(v.text1), '')
                   else v.text1
              end)
     ) v2(text2);

Here is a db<>fiddle.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    I think there is a parentheses missing on patindex (next to last line). [db fiddle with fix, though results aren't what I'd expect](https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=d3c9156664f0276d0f04d4733840b12e) – S3S Feb 06 '19 at 17:21
  • 1
    @scsimon . . . I always get these things wrong in SQL Server until I test them. The answer works now. – Gordon Linoff Feb 06 '19 at 17:26
  • 1
    UV from me cause it'll prob suffice their needs. It won't work for some cases, semicolon, colon, trailing braces, odd stuff. – S3S Feb 06 '19 at 17:30
  • @scsimon . . . This will remove any characters *not mentioned* in the character class in the code. So it removes and leading or trailing semicolons, colons, trailing braces and anything else that is not a-z, A-Z, or 0-9. – Gordon Linoff Feb 06 '19 at 17:33
0

Why not just use replace() ?:

SELECT REPLACE(REPLACE(col, '''', ''), '/', '')
Yogesh Sharma
  • 49,870
  • 5
  • 26
  • 52
0

You should be able to use Substring. Calculating length can be tricky:

DECLARE @temp TABLE (val varchar(100))
INSERT INTO @temp VALUES
 ('www.google.com'),('''www.google.com'''),('/www.google.com'),('www.google.com/'),('[www.google.com];')

SELECT SUBSTRING(val
                ,PATINDEX('%[a-zA-Z0-9]%', val) --start at position
                ,LEN(val) + 2 - PATINDEX('%[a-zA-Z0-9]%', val) 
                  - PATINDEX('%[a-zA-Z0-9]%', REVERSE(val)) --length of substring
                ) AS [Result]
  FROM @temp

Produces output:

Result
--------------
www.google.com
www.google.com
www.google.com
www.google.com
www.google.com
Zorkolot
  • 1,899
  • 1
  • 11
  • 8