Replace the MS Outlook html source string using regex?

Question

I have an application which reads the source html and downloads all the attachments of an email. This works fine except for the fact that Microsoft Outlook has some weird source value, for example...

<img width="163" height="39" id="Picture_x0020_1" src="cid:image001.png@01CD7F6C.70CD2320" alt="Description: Description: Description: cid:image001.png@01CC6D59.AEF6D270">

Firstly, I'd like to change it to just Attachments\image001.png as the source. Also, the alt should just be image001.png, not this long weird alt. Not really sure how to go about this.

[Don't use regex to parse html](http://stackoverflow.com/a/1732454/26226). — jrummell, Aug 23 '12 at 17:43
I think the title was fine, I'm just pointing out that Regex is usually highly unreliable at parsing html. — jrummell, Aug 23 '12 at 17:52
if you are sure that the `text` would have the `same pattern`,`same format` and would **NEVER** change,you **SHOULD** use `REGEX` then..This is unlikely with html files but I think `REGEX` would be a good option here... — Anirudha, Aug 23 '12 at 17:53

myermian · Accepted Answer · 2012-08-23T18:14:12.203

You should use Regex (I updated the tags in your question to reflect this):

Regex.Replace(text, @"src=""cid:(?<FileName>[^@]+)@[^""]*""", @"src=""Attachments\${FileName}""",
    RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
Regex.Replace(x, @"alt=""[^.]*cid:(?<FileName>[^@]+)@[^""]*""", @"alt=""${FileName}""",
    RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);

I'm sure there are more efficient ways of doing this, but that's what I could come up with.

Replace the MS Outlook html source string using regex?

1 Answers1