0

I have some third party xml, that I'm trying to parse.

The question is similar to this one in that I'm looking to get at pseudo xml code buried inside one of the elements. However, the result I need is different.

Here's the xml that's returned:

HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Content-Length: length

    <?xml version="1.0" encoding="utf-8"?>
    <soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
      <soap:Body>
        <PostApplication_V6Response xmlns="http://xxxService.org/">
          <PostApplication_V6Result>string</PostApplication_V6Result>
        </PostApplication_V6Response>
      </soap:Body>
    </soap:Envelope>

I'm using Linq to XML - I can return the element <PostApplication_V6Result> - this is the lowesst element in the tree I can retrieve.

Using this code:

    var name = "{http://xxxService.org/}PostApplication_V6Result";

    var soap = XDocument.Parse(result)
        .Descendants(name)
        .First();

However, the value contained within that element is some kind of pseudo xml - not xml but xml lookalike.

Here's what's contained inside:

<xxxService>
    <Application>
        <Status>Accepted</Status>
        <RedirectUrl>http://www.google.com?abc=123</RedirectUrl>
        <Value>100</Value>
    </Application>
</xxxService>

I've tried just about everything to get the data out, but I get either an invalid char '=' error or a data at root invalid message.

Ideally I want to get the data including within the "Application" node into a state where I can run it through a generic parser like the one below, but if I have to do something manually I will. I've been trying to solve this for a couple of days now.

public static T Deserialise<T>(this XElement element)
{
    var serializer = new XmlSerializer(typeof(T));

    using (var reader = element.CreateReader())
    {
        return (T)serializer.Deserialize(reader);
    }
}

Any help appreciated.

UPDATE

Here's the full xml thats returned- as you can see the inner portion is in fact html not xml.

<soap:body><postapplication_v6response xmlns="http://xxxService.org/"><postapplication_v6result>&lt;xxxService&gt;
&lt;Application&gt;
&lt;Status&gt;PURCHASED&lt;/Status&gt;
&lt;RedirectURL&gt;http://www.google.com?test=abc&amp;xyz=123&lt;/RedirectURL&gt;
&lt;/Application&gt;
&lt;/xxxService&gt;
</postapplication_v6result></postapplication_v6response></soap:body></soap:envelope>
Community
  • 1
  • 1
John Ohara
  • 2,821
  • 3
  • 28
  • 54
  • can you post the full xml? it sounds like your service is not returning valid xml? – Ewan Apr 21 '15 at 08:17
  • if the content of the node is a string which contains xml characters you should wrap it in a [CData] or escape it – Ewan Apr 21 '15 at 08:20
  • @Ewan - added the code Ewan - as you can see, the inner portion is html, not xml. – John Ohara Apr 21 '15 at 08:56
  • ok thats fine, just use innerText and deescape it like in this answer http://stackoverflow.com/questions/2203485/built-in-net-function-for-unescaping-characters-in-xml-stream – Ewan Apr 21 '15 at 08:57
  • @Ewan - I'm still getting aproblem with the '=' is the data when I try to reparse it. I'm doing this to get at the application node. – John Ohara Apr 21 '15 at 09:12
  • Ultimately I need to get the fields into a class, so I can use them. – John Ohara Apr 21 '15 at 09:16
  • why would you encode an object as html? – Ewan Apr 21 '15 at 09:17
  • if you can get a wsdl for this service visual studio will do it all for you. just add a webservice reference – Ewan Apr 21 '15 at 09:20

2 Answers2

1

Here is an example. (I've taken out the namespaces) :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Web;
using System.Xml.Linq;
using Microsoft.VisualStudio.TestTools.UnitTesting;

namespace UnitTestProject2
{
    [TestClass]
    public class Class7
    {
        [TestMethod]
        public void xmltest()
        { 
            string xml = @"<body><postapplication_v6response><postapplication_v6result>&lt;xxxService&gt;
&lt;Application&gt;
&lt;Status&gt;PURCHASED&lt;/Status&gt;
&lt;RedirectURL&gt;http://www.google.com?test=abc&amp;xyz=123&lt;/RedirectURL&gt;
&lt;/Application&gt;
&lt;/xxxService&gt;
</postapplication_v6result></postapplication_v6response></body>";

            XDocument doc = XDocument.Parse(xml);
            string encodedhtml = doc.Descendants("postapplication_v6result")
                    .First().Value;

            string decodedhtml = HttpUtility.HtmlDecode(encodedhtml);

            Console.WriteLine(decodedhtml);
        }
    }
}
Ewan
  • 1,261
  • 1
  • 14
  • 25
0

Side effect of decoding the entire string is, some XML special characters (& char in this case) that need to be kept encoded, they are get decoded resulting in an invalid XML. For this simple case, replacing & with &amp; should fix it :

var xml = @"<PostApplication_V6Result>
&lt;xxxService&gt;
&lt;Application&gt;
&lt;Status&gt;PURCHASED&lt;/Status&gt;
&lt;RedirectURL&gt;http://www.google.com?test=abc&amp;xyz=123&lt;/RedirectURL&gt;
&lt;/Application&gt;
&lt;/xxxService&gt;
</PostApplication_V6Result>";
var soap = XElement.Parse(xml);

var rawContent = HttpUtility.HtmlDecode(soap.FirstNode.ToString().Trim())
                            .Replace("&", "&amp;");
var content = XElement.Parse(rawContent);

Modify the code to encode other XML special characters if needed.

har07
  • 88,338
  • 12
  • 84
  • 137
  • Thanks to both of you for your help - har07, you have the final and complete answer that works, but Ewan you get an upvote from me too. – John Ohara Apr 21 '15 at 10:22
  • hmmm. I'd worry that xml != html with the ampersands escaped. but why html in the first place?! – Ewan Apr 21 '15 at 12:04