2

I have the following code

XElement element = new XElement("test", "a&b");

where

element.LastNode contains the value "a&b".

i wanted to be it "a&b".

How do i replace this?

Miki
  • 2,493
  • 2
  • 27
  • 39
user2392525
  • 33
  • 1
  • 11

3 Answers3

4

Wait a moment,

<test>a&b</test>

is not valid XML. You cannot make XML that looks like this. This is clarified by the XML standard.

& has special meaning, it denotes an escaped character that may otherwise be invalid. An '&' character is encoded as &amp; in XML.


for what its worth, this is invalid HTML for the same reason.

<!DOCTYPE html> <html> <body> a&b </body> </html>


If I write the code,

const string Value = "a&b";
var element = new XElement("test", Value);
Debug.Assert(
    string.CompareOrdinal(Value, element.Value) == 0,
    "XElement is mad");

it runs without error, XElement encodes and decodes to and from XML as necessary.

To unescape or decode the XML element you simply read XElement.Value.

If you want to make a document that looks like

<test>a&b</test>

you can but it is not XML or HTML, tools for working with HTML or XML won't intentionally help you. You'll have make your own Readers, Writers and Parsers.

Jodrell
  • 34,946
  • 5
  • 87
  • 124
3

The & is a reserved character so it will allways be encoded. So you have to decode:

Is this an option: HttpUtility.HtmlDecode Method (String)

Usage:

string decoded = HttpUtility.HtmlDecode("a&amp;b");
// returns "a&b"
hwcverwe
  • 5,287
  • 7
  • 35
  • 63
  • this works but i cannot assign this string value to Xelement.Lastnode – user2392525 Oct 09 '14 at 09:56
  • Of course you can. But it will be automatically encoded. – Krisztián Balla Oct 09 '14 at 09:59
  • you could just read `element.Value`, you should be careful HTML decoding XML. It works fine for `&` but not all characters get escaped the same way by both standards. – Jodrell Oct 09 '14 at 10:37
  • @user2392525 adding `a&b` without encoding it would result in syntactically incorrect xml. `a&b` has a syntax error. `a&b` is correct. It only requires you to decode the values – hwcverwe Oct 14 '14 at 11:22
0

Try following:

public static string GetTextFromHTML(String htmlstring)
    {
        // replace all tags with spaces...
       htmlstring= Regex.Replacehtmlstring)@"<(.|\n)*?>", " ");

       // .. then eliminate all double spaces
       while (htmlstring).Contains("  "))
       {
           htmlstring= htmlstring.Replace("  ", " ");
        }

       // clear out non-breaking spaces and & character code
       htmlstring = htmlstring.Replace("&nbsp;", " ");
       htmlstring = htmlstring.Replace("&amp;", "&");

       return htmlstring;
    }
SHEKHAR SHETE
  • 5,964
  • 15
  • 85
  • 143