0

I am not good at regex and was trying to read some stuff but it is getting me nowhere.

I have a large html string something like with string like:

<input data-val="true" data-val-required="The SearchType field is required." id="SearchType" name="UserSearchType" type="hidden" value="something">

I am trying to write a regex using which i can find all the strings and then replace the type to label from any other type.

even if i can get a colletion of strings from regex then that will be great.

For e.g

 string testHtml =
    "abc <input data-val='true' data-val-required='The SearchType field is required.'  id='UserSearchType' name='UserSearchType' type='hidden' value='Scos'> abc <input data-val='true' data-val-required='The UserSearchType field is required.' id='UserSearchType' name='SearchType' type='hidden' value='sco'>";

I am trying to find <input ....> and create a collection or find <input ..type='text'..> and change it to <input ..type='label'..>

Please let me know if the question is vague and need any details

theftprevention
  • 5,083
  • 3
  • 18
  • 31
fireholster
  • 622
  • 1
  • 9
  • 22
  • Can you add to your question what `testHtml` should look like after applying a regex on it ? – Stephan Jan 30 '14 at 19:54
  • Regex is not suited to solving this problem. Use an XHTML parser. [Html Agility Pack](http://htmlagilitypack.codeplex.com/) is recommended in other threads, though I have not used it myself. – adamdc78 Jan 30 '14 at 21:06
  • you guys are right it was getting difficult with regex as some exception conditions are popping up. i am trying the agility pack. worked so far – fireholster Jan 31 '14 at 17:23

2 Answers2

0

you can do something like this. you will get a lot of pushback for parsing xml with regex but this should work for your example.

Regex r = new Regex("(<[^>]*type=['\"])([a-zA-Z])+(['\"][^>]*>)");
string text = "abc <input data-val='true' data-val-required='The SearchType field is required.'  id='UserSearchType' name='UserSearchType' type='hidden' value='Scos'> abc <input data-val='true' data-val-required='The UserSearchType field is required.' id='UserSearchType' name='SearchType' type='hidden' value='sco'>";
string replaced = r.Replace(text,"$1label$3");
Bozman
  • 477
  • 4
  • 16
0

Give this a try (and see the regular expression in action here):

<((?:[^=]+=(?:"(?:[^\\][^"])+"|'(?:[^\\][^'])+'|[^'"\s]+?)\s+)*)type=(?:"(?:[^\\][^"])+?"|'(?:[^\\][^'])+?'|[^'"\s]+?)([^//>]*?/?)>

In C#, you would use:

string testHtml = "abc <input data-val='true' data-val-required='The SearchType field is required.'  id='UserSearchType' name='UserSearchType' type='hidden' value='Scos'> abc <input data-val='true' data-val-required='The UserSearchType field is required.' id='UserSearchType' name='SearchType' type='hidden' value='sco'>";
string pattern = "<((?:[^=]+=(?:\"(?:[^\\\\][^\"])+\"|'(?:[^\\\\][^'])+'|[^'\"\\\s]+?)\\\s+)*)type=(?:\"(?:[^\\\\][^\"])+?\"|'(?:[^\\\\][^'])+?'|[^'\"\\\s]+?)([^/>]*?/?)>";
Regex rgx = new Regex(pattern);
string newHtml = rgx.Replace(testHtml, "<$1type='label'$2>");

This is a pretty hefty regular expression. It accounts for however many other attributes the HTML tags might have, whether their attribute values are enclosed in double quotes ("), single quotes ('), or no quotes, and so on. Let me know if it helps!

theftprevention
  • 5,083
  • 3
  • 18
  • 31