6

I have an application which reads the source html and downloads all the attachments of an email. This works fine except for the fact that Microsoft Outlook has some weird source value, for example...

<img width="163" height="39" id="Picture_x0020_1" src="cid:image001.png@01CD7F6C.70CD2320" alt="Description: Description: Description: cid:image001.png@01CC6D59.AEF6D270">

Firstly, I'd like to change it to just Attachments\image001.png as the source. Also, the alt should just be image001.png, not this long weird alt. Not really sure how to go about this.

myermian
  • 31,823
  • 24
  • 123
  • 215
michael
  • 14,844
  • 28
  • 89
  • 177
  • 2
    [Don't use regex to parse html](http://stackoverflow.com/a/1732454/26226). – jrummell Aug 23 '12 at 17:43
  • I think the title was fine, I'm just pointing out that Regex is usually highly unreliable at parsing html. – jrummell Aug 23 '12 at 17:52
  • 2
    if you are sure that the `text` would have the `same pattern`,`same format` and would **NEVER** change,you **SHOULD** use `REGEX` then..This is unlikely with html files but I think `REGEX` would be a good option here... – Anirudha Aug 23 '12 at 17:53

1 Answers1

3

You should use Regex (I updated the tags in your question to reflect this):

Regex.Replace(text, @"src=""cid:(?<FileName>[^@]+)@[^""]*""", @"src=""Attachments\${FileName}""",
    RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);
Regex.Replace(x, @"alt=""[^.]*cid:(?<FileName>[^@]+)@[^""]*""", @"alt=""${FileName}""",
    RegexOptions.IgnoreCase | RegexOptions.CultureInvariant);

I'm sure there are more efficient ways of doing this, but that's what I could come up with.

myermian
  • 31,823
  • 24
  • 123
  • 215