0

I know the "XML parsing: semicolon expected" error is a data error, and not a SQL error. The first error I am trying to get rid of is the one about DTD subsets, so I am using CONVERT(xml, xml_data, 2), but then I get this: Msg 9411, Level 16, State 1, Line 14 XML parsing: line 1, character 94, semicolon expected

I don't have access to modify the actual xml. I am querying. It is data in a table (with >1 million records, so I don't even know which record is throwing the error), received from a vendor as responses to API calls. I am trying to shred the xml in a stored procedure, in order to extract bits of data. Is it possible to somehow bypass the records with the bad data? I am NOT processing row by row at the moment, nor do I want to.

Thanks!

I tried this, but it didn't work, either:

SELECT s.id as SessionID  
, st.iD as TransmissionID  
, st.TransmissionTypeID  
, st.xml_data  
, CONVERT(xml, replace(cast(st.xml_data as varchar(max)),'&','&'), 2) as converted_xml  
, s.insertdt  
, st.responsecode  

INTO #Session_transmissions
FROM XML_Sessions s with (nolock)
join XML_Session_Transmissions st with (nolock) on s.id = st.tbl_Subscription_XML_SessionID
WHERE s.insertdt >= '3/1/2018`

This is what finally worked:

SELECT PrimaryKeyColumn, TRY_CONVERT(xml, VarcharColumn) AS XMLValue, VarcharColumn

FROM YourTable WHERE TRY_CONVERT(xml, VarcharColumn) IS NULL AND VarcharColumn IS NOT NULL;

  • Your problem is, that any xml based approach to get *into* the XML will need a well-formed xml in order to work. There are some tools. [This related answer](https://stackoverflow.com/a/49008497/5089204) provides some links to tools / approaches how to deal with malformed xml. It might help to show an example of such a bad xml. Maybe a simple string-replace of `&` could help... – Shnugo Mar 06 '18 at 14:38
  • But if I string replace &. won't that impact my ability to parse the xml later? Is there a way to identify the bad records, and only replace on the bad documents? – Marianne Collins Mar 06 '18 at 14:51
  • I'm pretty sure, that the *missing semicolon* is triggered by an `&` character. This character is the signal to start an entity code, the semicolon ends this: `<` encodes the `<`. If there is an `&` in the content intentionally, `&` must be used. Replacing all `&` with `&` can help, but can make things worse too. The problem is, that you have to deal with strings, which look like XML but they are not... – Shnugo Mar 06 '18 at 15:01
  • Which database are you using? Is there any code you can show us? – jhinghaus Mar 06 '18 at 15:37
  • If you do the string replace inside of a sub-query, you won't have to modify your existing data. – tgolisch Mar 06 '18 at 15:40
  • Tried this `, CONVERT(xml, replace (replace (replace (replace (replace(Transmission_Data, '&','&'), '<','<'), '>','>'), '"','"'), '''',''') , 2) as converted_xml` in my SELECT, but am still getting the same error – Marianne Collins Mar 06 '18 at 21:39
  • @MarianneCollins this is SQLServer? which Version? Can you use Try_convert? – Shnugo Mar 07 '18 at 10:49
  • @MarianneCollins btw, replacing the `<>` must destroy the XML... – Shnugo Mar 07 '18 at 13:12
  • I ended up using this: SELECT PrimaryKeyColumn, TRY_CONVERT(xml, VarcharColumn) AS XMLValue, VarcharColumn FROM YourTable WHERE TRY_CONVERT(xml, VarcharColumn) IS NULL AND VarcharColumn IS NOT NULL; – Marianne Collins Mar 09 '18 at 21:45

0 Answers0