1

I want to extract an svg element by his class name with a C# regex.

For example I have this:

<path fill="none" ... class="highcharts-tracker highcharts-tracker" ... stroke-width="22" zIndex="2" style=""/>

And I want to delete every path elements with highcharts-tracker as class name by using :

 new Regex("");

Anybody know ?

RajeshKdev
  • 6,365
  • 6
  • 58
  • 80
Dragouf
  • 4,676
  • 4
  • 48
  • 55
  • What XML API are you using? Do you *really* need a regular expression for the class matching, or could you just specify the class names you want to remove, and split the attribute value? – Jon Skeet May 06 '13 at 09:46
  • I just have svg code as a string. That's why I want to use Regex. – Dragouf May 06 '13 at 09:47
  • No, no, no. You should *not* be treating it just as a string. It's meant to be XML, so you should *use* it as XML. Anything else is going to cause large amounts of pain. – Jon Skeet May 06 '13 at 09:48
  • but it's malform and I just want to treat it as a string and also output it as a string later. – Dragouf May 06 '13 at 09:49
  • If it's malformed then it's not really SVG, is it? Why is it malformed? (Chances are it's due to some other tool taking this hacky approach of just treating it as text instead of XML...) – Jon Skeet May 06 '13 at 09:50
  • yes just treating as a text. I don't want to render it. – Dragouf May 06 '13 at 09:52
  • But it's still garbage... I'd expect actual svg tools to require *valid* data. Even if you're not rendering it, presumably you want *something* to eventually. What's the point of processing something if nothing will be able to render it? – Jon Skeet May 06 '13 at 09:54
  • You should not parse xml with regex. http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Stephane Rolland May 06 '13 at 12:39

1 Answers1

4

In LINQ to XML, this is pretty straightforward:

var classToRemove = "highlights-tracker";
var xml = XDocument.Parse(svg);
var elements = doc.Descendants("path")
                  .Where(x => x.Attribute("class") != null &&
                              x.Attribute("class")
                               .Value.Split(' ')
                               .Contains(classToRemove));
// Remove all the elements which match the query
elements.Remove();

You should not use regular expressions to try to parse XML... XML is very well handled by existing APIs, and regular expressions are not an appropriate tool.

EDIT: If it's malformed (which you should have said to start with) you should try to work out why it's malformed and fix it before you try to do any other processing. There's really no excuse for XML being malformed these days... there are plenty of good XML APIs for just about every platform in existence.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Ok thanks, will try but if then I want to render elements as text how I have to do ? – Dragouf May 06 '13 at 09:57
  • @Dragouf: If it's not valid XML, there's no such thing as an element, is there? I don't understand why you object to the idea of actually getting valid data to start with. – Jon Skeet May 06 '13 at 10:01
  • ok, I mean in the case it's valid svg how I can render "elements" variable as text ? – Dragouf May 06 '13 at 10:10
  • As much as I agree that if your parsing XML you should go with an actual parser, however, in this instance I think using a regex is perfectly acceptable. Loading the full document into an `XDocument` could have an impact on performance (particularly if it's a large document). – James May 06 '13 at 10:11
  • @Dragouf: Well the `elements` variable is a sequence of elements. How do you *want* to render that sequence, and where? – Jon Skeet May 06 '13 at 10:11
  • @James: That depends on whether you think correctness is important. I suspect that constructing a genuinely correct regex here is going to be *very* hard - and we have no indication whatsoever that performance would be a problem here. Why reject a natural solution which uses an appropriate API for the problem domain just because there *might* be a problem with performance? (Maybe there'll be a problem with performance of regex too - maybe we can't load the whole document into memory...) – Jon Skeet May 06 '13 at 10:13
  • I want in the end to output it as string. – Dragouf May 06 '13 at 10:13
  • @Dragouf: I thought you wanted the whole *document* as a string. `elements` in my code is just the elements you want to remove. You can call `XDocument.ToString()` to convert the XML document back to a string representation - or just save it directly to a stream, for example. – Jon Skeet May 06 '13 at 10:14