2

I'm consuming a web service using the RestSharp library. I don't have any control over the web-service implementation as it is a third-party (Taleo Business Edition).

My issue is that there is some bad data which contains invalid characters. A lot of this data is copy/pasted from documents and I can't force the users to go back and clean this up. It doesn't help that the bad character is an invisible control code (0x01).

The only solution I can think of is to add a pre-processing step before RestSharp attempts to deserialize the XML. I would really like to avoid writing my own XML deserializer.

I've looked at extending the XmlSerializer class but it doesn't seem that any of the virtual methods would be useful for adding in this pre-processing step.

I've also looked at trying to use the OnBeforeDeserializing event in the RestSharp library but I don't see what I could do there that would allow me to pre-process the XML.

I feel like I'm missing something basic here because it seems like something that would be a common use-case for consuming a RESTful web service.

Mike D.
  • 4,034
  • 2
  • 26
  • 41

3 Answers3

3

Unfortunately using OnBeforeDeserialization doesn't allow you to preprocess the content. Neither the Content or RawBytes properties are actually changed when you modify them here. This explains why none of the regex solutions seemed to have any effect when trying to clean my XML.

In order to modify the content you have to use a custom XML deserializer. Fortunately this is easier than I thought as you can extend RestSharp.Deserializers.XmlDeserializer and override the Deserialize<T> method. You can then modify response.Content before passing it to the base function.


The solution I ended up using:

class CustomXmlDeserializer : RestSharp.Deserializers.XmlDeserializer {
    public override T Deserialize<T>(IRestResponse response) {
        //string pattern = @"&#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|7F|8[0-46-9A-F]9[0-9A-F])"; // XML 1.0
        string pattern = @"#x((10?|[2-F])FFF[EF]|FDD[0-9A-F]|[19][0-9A-F]|7F|8[0-46-9A-F]|0?[1-8BCEF])"; // XML 1.1
        System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(pattern, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
        if (regex.IsMatch(response.Content)) {
            response.Content = regex.Replace(response.Content, String.Empty);
        }
        response.Content = response.Content.Replace("&;", string.Empty);

        return base.Deserialize<T>(response);
    }
}

Based on this answer: https://stackoverflow.com/a/8331749/201021


My main issue was a whole heap of invalid xml entities in the document. I never saw any actual invalid control code characters. But I had a lot of things like &#x0; and &#x4 and things like that. This meant I couldn't use solutions that only escaped specific character values.

When I attempted using the regular expressions above in OnBeforeDeserialize it didn't seem to work at all. The issue wasn't actually with the regular expressions but the fact that you cannot modify the Content property there.

This solution may be too localized for others but you should be able to modify the preprocessing code here to achieve the result you need.

Community
  • 1
  • 1
Mike D.
  • 4,034
  • 2
  • 26
  • 41
  • 1
    This solution worked for me to transform a bunch of > and < values in the content of my root node to actually being deserialized to my object. – Dominic Bindley Aug 19 '16 at 18:43
2

I think you're on the good way with OnBeforeDeserializing.

What about :

request.OnBeforeDeserialization = resp =>
{
   // here, resp.Content is the xml in string. Just erase the invalid characters
   // resp.Content = resp.Content.Replace(..., "")          
};
Phil-R
  • 2,193
  • 18
  • 21
  • Unfortunately this solution didn't actually work for me as modifying the `Content` or `RawBytes` properties in `OnBeforeDeserialization` has no effect. It doesn't seem like they're actually passed through. The only solution is a custom XML deserializer. – Mike D. Sep 24 '15 at 12:55
0

This solution helped me. Had to replace special characters before passing the response to DotNetXmlDeserializer`

string filtered_resp = response.Content;
if (filtered_resp.Contains("&"))
    {

      filtered_resp = response.Content.Replace("&", string.Empty);
           
    }            
RestResponse modified_response = new RestResponse { Content = filtered_resp };
return DotNetXmlDeserializer.Deserialize<T>(modified_response);