I want to use C# to parse HTML data.
If you think of every character of HTML data as being a bit: true = "html/code". false = "display/content". Then you would know which part of the HTML is the "code".
Let's use the following HTML example:
<a id="a1" class="c1" attr1="x" attr2="y">a1 c1 attr1</a> <p>a1 c1 attr1 attr2</p>
I want to do a C# String.Replace to find all instances of "a1" and replace it with "new1". I want to do a C# String.Replace to find all instances of "attr1" and replace it with "new2". But I only want the html "code" to be affected, and I want all "content" to NOT be changed. The desired result is:
<a id="new1" class="c1" new2="x" attr2="y">a1 c1 attr1</a> <p>a1 c1 attr1 attr2</p>
Note: the desired result has 2 other instances of "a1" that were not renamed. Note: the desired result has 2 other instances of "attr1" that were not renamed.
I can't find any existing library or software that would help in this effort.
EDIT1: HtmlAgilityPack might be an option. However, I'm still no closer to understanding how I could use it to differentiate between code and not-code?
EDIT2: Please keep in mind this question is simplified of my real problem as much as possible. Renaming things with and without quotes won't be the answer. I specifically need to figure out how to differentiate between code and not-code.
EDIT3: I have included "attr1" as a secondary String.Replace. I need to find both attributes AND values of attributes to replace. And I need to be able to distinguish between code and not-code.
Any suggestions?