C# regex definition

Question

I still have problems applying regex syntax - and that some languages even use different dialects doesn't make things easier.

I want to remove all links containing a CID (whatever that is), replace them with empty string. So I defined:

private Regex regex = new Regex("<a href=\"cid:{[a-f0-9-]*}\">([^<])</a>");

and went for a replacement:

html = regex.Replace(html,"");

But for some reason or another, my Regex does not work.

The result still contains:

<a href="cid:c524ae03-7ac7-4823-9a28-af17a6acf12f">Test.txt</a>

I also tried the following:

new Regex("<a href=\"cid:[a-f0-9-]*\">([^<])</a>");
new Regex("<a href=\"cid:{([a-f0-9-]*)}\">([^<])</a>");
new Regex("<a href=\"cid:([a-f0-9-]*)\">([^<])</a>");

but I can't figure out what's wrong with my regex...

[Do not use regular expressions to parse HTML](http://stackoverflow.com/a/1732454/803335) — Stephan, Apr 22 '14 at 11:33
Dear @Stephan, I am not parsing HTML. Building a DOM tree from possibly malformed HTML, just to remove links which have a specific layout, may be destroying the malformedness, which may be or may not be intended by the author of the HTML document. Furthermore I think, but you may prove me wrong, that all comments are lost this way, since comments are not part of the DOM tree. — Alexander, Apr 22 '14 at 11:43
@Alexander, I just wanted to point out, that this is not a trivial task. What about nested tags or even nested CDATA sections... I'm not sure about the point with maintaining malformedness, but at least every XML parser that I worked with can be configured to put comments into the DOM. — Stephan, Apr 22 '14 at 12:27

score 3 · Accepted Answer · answered Apr 22 '14 at 11:33

3

try this :

<a href=\"cid:[a-f0-9-]*\">[^<]+<\/a>
                               |here

the problem with your regex was that , your regex checked for only one letter not a whole word

demo here : http://regex101.com/r/hH6qW1

answered Apr 22 '14 at 11:33

aelor

10,892
3
32
48

C# regex definition

1 Answers1