0

I am making a c# code that converts relative to absolute URLs in href and src attributes of an inputted HTML code in a Richtextbox when the user clicks a button, using a path that the user input. I need a regex that only matches relative URLs inside href and src attributes and converts them to absolute. this is what I am trying to achieve: example: if the path that the user inputted: https://example.com/page and the html code in Richtextbox is :

<a href="https://example.com">click</a>
<a href="page1.html">click</a>
<img src="/img1.png" />
<img src="../img2.png" />

this is the result that I want for the html code:

<a href="https://example.com">click</a> //this doesn't change
<a href="https://example.com/page/page1.html">click</a>
<img src="https://example.com/page/img1.png" />
<img src="https://example.com/img2.png" />

I have only been able to come up with regex that matches href attributes .href=(["])(.?)\1 but I can't come up with a regex that does the work above (relative to absolute).

TBA
  • 1,921
  • 4
  • 13
  • 26
Reem Rizk
  • 33
  • 6

1 Answers1

0

A couple of tips for you:

  1. Please don't use regex to parse HTML. It could break the universe. See here for more info: RegEx match open tags except XHTML self-contained tags
  2. Instead, you can use HTML Agility Pack, as suggested in this SO answer
Hunter Tran
  • 13,257
  • 2
  • 14
  • 23