0

I still have problems applying regex syntax - and that some languages even use different dialects doesn't make things easier.

I want to remove all links containing a CID (whatever that is), replace them with empty string. So I defined:

private Regex regex = new Regex("<a href=\"cid:{[a-f0-9-]*}\">([^<])</a>");

and went for a replacement:

html = regex.Replace(html,"");

But for some reason or another, my Regex does not work.

The result still contains:

<a href="cid:c524ae03-7ac7-4823-9a28-af17a6acf12f">Test.txt</a>

I also tried the following:

new Regex("<a href=\"cid:[a-f0-9-]*\">([^<])</a>");
new Regex("<a href=\"cid:{([a-f0-9-]*)}\">([^<])</a>");
new Regex("<a href=\"cid:([a-f0-9-]*)\">([^<])</a>");

but I can't figure out what's wrong with my regex...

Alexander
  • 19,906
  • 19
  • 75
  • 162
  • [Do not use regular expressions to parse HTML](http://stackoverflow.com/a/1732454/803335) – Stephan Apr 22 '14 at 11:33
  • why do you use { and } parentheses? – Axarydax Apr 22 '14 at 11:35
  • Dear @Stephan, I am not parsing HTML. Building a DOM tree from possibly malformed HTML, just to remove links which have a specific layout, may be destroying the malformedness, which may be or may not be intended by the author of the HTML document. Furthermore I think, but you may prove me wrong, that all comments are lost this way, since comments are not part of the DOM tree. – Alexander Apr 22 '14 at 11:43
  • @Alexander, I just wanted to point out, that this is not a trivial task. What about nested tags or even nested CDATA sections... I'm not sure about the point with maintaining malformedness, but at least every XML parser that I worked with can be configured to put comments into the DOM. – Stephan Apr 22 '14 at 12:27

1 Answers1

3

try this :

<a href=\"cid:[a-f0-9-]*\">[^<]+<\/a>
                               |here

the problem with your regex was that , your regex checked for only one letter not a whole word

demo here : http://regex101.com/r/hH6qW1

aelor
  • 10,892
  • 3
  • 32
  • 48