1

I want to locate all image tags in my html with src not containing http:// and prepend http:// to the src attribute.

I have got the regex to find all img tags not starting with http://. I'm having some trouble appdening http:// to the src attribute alone. How can I achieve this using regex replace.

<img [^<]*src="(?!http://)(?<source>[^"]*)"[^<]*/>

Source will contain the src value. I just need it to say $2 = "http://" + $2. How can I write this in c# code.

Alex J
  • 1,547
  • 2
  • 26
  • 41

1 Answers1

2

Since you don't want to break existing tags, you will need to assign groups to the parts of the string you are not interested in; in order to be able to include those parts of the match in the replace pattern:

(<img [^<]*src=")(?!http://)(?<source>[^"]*)("[^<]*/>)

Then the replace is trivial:

regex.Replace(input, "$1http://$3$2");

(Also, this might work for your application use case, but I should mention, that in general it is not considered a good idea to parse HTML with regex)

Community
  • 1
  • 1
driis
  • 161,458
  • 45
  • 265
  • 341
  • its actually regex.Replace(input, "$1http://$3$2"); $2 instead of $4. You should update your answer. – Alex J Nov 07 '11 at 20:47
  • Thank you. it works great. I also understood the concept of regex.replace, finally. Thanks for your help. – Alex J Nov 07 '11 at 21:10