0

I have a SQL Server connection to an external table in my application and I need to make a query where one of the columns has wrong formatting, let's say, the format is alphanumeric without symbols but the column has data with dashes, apostrophes, dots, you name it. Is it possible to just query one of the columns with that filtered out? It'd really help me. I'm using Laravel and I know I can make an accessor to clean that out but the query is heavy.

This is an example:

Data sought: 322211564

Data found: 322'211'564

Also 322-211-564

EDIT: Just to clarify, I don't want to EXCLUDE data, but to "reformat" it without symbols.

EDIT: By the way, if you're curious using Laravel 5.7 apparently you can query the accessor directly if you have the collection already. I'm surprised but it does the trick.

ffuentes
  • 1,042
  • 5
  • 16
  • 36
  • So, to confirm, you want to remove **every** non numerical character? Every non alphanumerical character? Something else? – Thom A Mar 20 '19 at 20:43
  • No, most codes are numerical but they can contain letters, it's a string in my application. What they can't have are symbols. – ffuentes Mar 20 '19 at 20:45
  • [Does any answer here work](https://stackoverflow.com/questions/18625548/t-sql-select-query-to-remove-non-numeric-characters) – S3S Mar 20 '19 at 20:46
  • 3
    Do you have a defined list of characters you want removed? Is there any chance you can fix whatever is capturing this data so it isn't filled with garbage? – Sean Lange Mar 20 '19 at 20:47
  • You could use nested replace functions to remove a known list of offending characters. It is super fast and easy to create. But if you have a more complicated list of possible bad characters then something what Larnu posted is the way to go. – Sean Lange Mar 20 '19 at 20:55

2 Answers2

3

A wild card guess, but perhaps this works:

WITH VTE AS(
    SELECT *
    FROM (VALUES('322''211''564'),
                ('322-211-564')) V(S))
SELECT S,
       (SELECT '' + token
        FROM dbo.NGrams8k(V.S,1) N
        WHERE token LIKE '[A-z0-9]'
        ORDER BY position
        FOR XML PATH('')) AS S2
FROM VTE V;

This makes use of the NGrams8k function. If you need other acceptable characters you can simply add them to the pattern string ('[A-z0-9]').

If, for some reason, you don't want to use NGrams8k, you could create an inline tally table, which will perform a similar function:

WITH N AS(
    SELECT N
    FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
    FROM N N1 --10
         CROSS JOIN N N2 --100
         CROSS JOIN N N3 --1000
         CROSS JOIN N N4 --10000 --Do we need any more than that? You may need less
    ),
VTE AS(
    SELECT *
    FROM (VALUES('322''211''564'),
                ('322-211-564')) V(S))    
SELECT V.S,
       (SELECT '' + SS.C
        FROM Tally T
             CROSS APPLY (VALUES(SUBSTRING(V.S,T.I,1))) SS(C)
        WHERE SS.C LIKE '[A-z0-9]'
        ORDER BY T.I
        FOR XML PATH(''),TYPE).value('.','varchar(8000)') AS S2
FROM VTE V;

Also, just in case, I've used the TYPE format and the value function. If you then change your mind about not wanting any special characters and need an acceptable character like &, it won't be changed to &.

Thom A
  • 88,727
  • 11
  • 45
  • 75
0

Note for pattern-based string replacements, you can use a library like SQL Server Regex. Call RegexReplace on the string you want to transform:

select RegexReplace(col, '[^A-Za-z0-9]', '') from tbl

That call will remove any non-alphanumeric character.

To find all the rows where the column contains only alphanumeric characters:

select col from tbl where col not like '%[^A-Za-z0-9]%'

The like pattern consists of:

% - Matches 0 or more characaters.

[^A-Za-z0-9] - Matches any character not in A-Z, a-z, and 0-9. The ^ symbol at the beginning of the character class means characters that do not match.

By using not like your query will reject strings that contain a non-alphanumeric character anywhere in the string.

jspcal
  • 50,847
  • 7
  • 72
  • 76
  • It's so confusing that it uses "not like". – ffuentes Mar 20 '19 at 20:52
  • 1
    This is only going to identify rows with characters not desired. The OP wants a query to return the rows but removed the offending characters. This misses that mark. – Sean Lange Mar 20 '19 at 20:54
  • BTW, what I'm asking or trying to ask for is to show the same data fro m that column but replacing the data shown on the column without the symbols. – ffuentes Mar 20 '19 at 20:54
  • Yes, that is necessary because the `like` operator is not as advanced as a full regular expression engine. – jspcal Mar 20 '19 at 20:55
  • The question as written says you want a *query that filters* those rows out. – jspcal Mar 20 '19 at 20:57