0

I created a regex to parse the src-attribute (hosting the images now on cloudfront) and alt-attribute from images. I'm substituting them to turn them into Markdown.

Some are embedded into an attribute. I created an optional group for the surrounding (<a href.*)? Since I use backreferences to create my Markdown I don't want these optional groups to be captured. How can I do this?

This is my regex:

/(<a href.*)?<img.*?src="http:\/\/www.example.com\/uploads(.*?)" alt="(.*?)".*?\/?>(<\/a>)?/gm

This is my substituion:

![$2](http://example.cloudfront.net/images$1)
Hedge
  • 16,142
  • 42
  • 141
  • 246

1 Answers1

0

Agreed that parsing HTML with regex is not a nice idea. But since you are wanting to hand out some reputation points I'll be a cherry picker... :-)

/(?:<a href.*)?<img.*?src="http:\/\/www.example.com\/uploads(.*?)" alt="(.*?)".*?\/?>(?:<\/a>)?/gm

I will state again here at the end for clarity that parsing HTML with regex is not a good idea.

Peter Bowers
  • 3,063
  • 1
  • 10
  • 18