-2

I have a table called TableName with a column Characters.

In the Characters column I have the following data with some others that need to be checked as well as these:

Apples
15
chocolate
afternoon
Joe & taffy
@fternoon
^up
(here)
`til

I want to have a SQL script that would allow me to find any characters in that column that would be invalid for XML or SQL. Sort of like a patindex.

So I would like this to show something like this:

Joe & Taffy, line number 5, 4th character

@fternoon, line number 6, 1st character

D'antonio, line number 452, 2nd character

Ch@rles', line number 947, 3rd character and 8th character.

KingSchmoyer
  • 13
  • 1
  • 4

1 Answers1

2

If try_convert is supported by your version of MS SQL Server?
Then you can try to convert the string to an XML.
If TRY_CONVERT couldn't convert to XML, then it would return NULL.

And you can use a character set [...] in a PATINDEX.
If you'd want to locate the first position of the usual suspects.

Example :

declare @TestTable table (id int identity(1,1) primary key, Characters varchar(30));

insert into @TestTable (Characters) values
('Apples'),('15'),('chocolate')
,('@fternoon'),('^up'),('(here)'),('`til')
,('after < noon')
,('Joe & taffy')
;

select *, patindex('%[&<]%',Characters) as FirstBadCharPosition
from @TestTable
where try_convert(XML,concat('<x>',Characters,'</x>')) is null;

Result:

id  Characters      FirstBadCharPosition
--  ----------      --------------------
8   after < noon    7
9   Joe & taffy     5
LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • The question lies with data that is unknown so as an example, I listed a few things from the column. There are hundreds of more items in that table besides apple, 15, chocolate etc.. I need it to look through everything without me having to insert into if that makes sense. Shouldnt the result show the ' for the til also? – KingSchmoyer Jul 27 '18 at 16:26
  • If you read [this older SO post](https://stackoverflow.com/questions/730133/invalid-characters-in-xml) and the answers then you notice that the are only a few characters that have to be escaped with `xml entities` for XML compliance. F.e. `&` to `&`. And there are unicode ranges for valid characters in the [xml specification](http://www.w3.org/TR/xml/#charsets). And normally the XML data type in Sql Server wouldn't accept invalid xml. – LukStorms Jul 27 '18 at 17:36
  • Is there a way to get that result but without listing everything in the column? So make it look at everything in the column. – KingSchmoyer Jul 28 '18 at 02:09