0

I need to manipulate a XML string.
The string is this one :

<div class="addthis_toolbox addthis_default_style ">
<a class="addthis_button_facebook_like" fb:like:layout="button_count"></a>
<a class="addthis_button_tweet"></a>
<a class="addthis_counter addthis_pill_style"></a>
</div>

I thought I would convert it into a XmlDocument, but XmlDocument.LoadXml() throws an error about the ":" character ; it's because of the fb:like:layout attribute.

What I need to do, is add an addthis:url attribute to the first element with a addthis_toolbox or addthis_button class.

I'm pretty confident that I can find the element with the correct class, but I'm not really confident that I can add a "composite" attribute like that... especially since I can't even load the thing to a XmlDocument.

Did I miss something ? Is there a better/simpler way ?

Thanks

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
thomasb
  • 5,816
  • 10
  • 57
  • 92

3 Answers3

5

The XML is well-formed according to the XML 1.0 recommendation, but it is not namespace-well-formed according to the XML Namespaces 1.0 recommendation. So you should be able to parse it if your XML parser has a switch to disable namespace processing. I've no idea if .net's XmlDocument parser has such a switch.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
4

Provided XML isn't well-formed, so you can't manipulate it using XML parser.

You can perform pre-processing of this text, so it becomes well-formed XML, then manipulate it as XML using XML engine.

EDIT:

Read: RegEx match open tags except XHTML self-contained tags

But may be in your case usage of regex is most appropriate, if you structure of input HTML is regular, e.g.:

You can use this regex

(?x)
(?<=<)[^>]*
class="[^"]*
\b(?:addthis_toolbox|addthis_button)\b
[^"]*"
[^>]*

to find div class="addthis_toolbox addthis_default_style ", then replace this string, i.e.:

string xml = @"<div class=""addthis_toolbox addthis_default_style "">
<a class=""addthis_button_facebook_like"" fb:like:layout=""button_count""></a>
<a class=""addthis_button_tweet""></a>
<a class=""addthis_counter addthis_pill_style""></a>
</div>
";

const string Pattern = @"(?xs)
    (?<=<)([^>]*
    class=""[^""]*
    \b(?:addthis_toolbox|addthis_button)\b
    [^""]*"")
    [^>]*
";

var result = Regex.Replace(xml, Pattern, "$0 addthis:url=\"value\"");

Result:

<div class="addthis_toolbox addthis_default_style " addthis:url="value">
<a class="addthis_button_facebook_like" fb:like:layout="button_count"></a>
<a class="addthis_button_tweet"></a>
<a class="addthis_counter addthis_pill_style"></a>
</div>
Community
  • 1
  • 1
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
1

http://64.215.254.44/forum/viewtopic.php?f=5&t=26854

You can actually remove the following: fb:like:layout="button_count" since button count is the default layout.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147