-1

Suppose i have some sample text like below:

;&nbsp; </span>&lt;year&gt;<o:p></o:p>
</span>&lt;</span><span style=3D'font-size:9.0pt;mso-bidi-font-family:Arial'>manufacturer&gt;</span><span                  style=3D'mso-bidi-font-family:Arial'>
</span>&lt;model&gt;<o:p>
</span>&lt;<span class=3DSpellE>serial_number</span>&gt;<o:p>
</span>&lt;<span class=3DSpellE>accessories_value</span>&gt;<o:p></o:p></span>
</span>&lt;<span class=3DSpellE>accessories_list</span>&gt;
p;&nbsp; </span>&lt;<span class=3DSpellE>worldwide_yn</span>&gt;
</span>&lt;</b><span class=3DSpellE><span style=3D'mso-no-proof:yes'>pet_name</span></span><span style=3D'mso-   no-proof:yes'>&gt;</span><o:p></o:p></p>

I am looking to find and replace every occurrences of following pattern:

&lt; any_html_tags markers_text any_html_tags &gt; 

Here :

html_tags: optional, may be both opening and closing type, may be zero to many times in numbers, there may be any HTML marker here.

markers_text: can be in one of two formats as either xxxxx (any no. of characters) or xxxx_xxxxxx (text can be of any length).

like i want to be able to find following texts in sample file:

1) &lt;year&gt;
2) &lt;</span><span style=3D'font-size:9.0pt;mso-bidi-font-family:Arial'>manufacturer&gt;
3) &lt;model&gt;
4) &lt;<span class=3DSpellE>serial_number</span>&gt;
5) &lt;<span class=3DSpellE>accessories_value</span>&gt;
6) &lt;<span class=3DSpellE>accessories_list</span>&gt;
7) &lt;<span class=3DSpellE>worldwide_yn</span>&gt;
8) &lt;</b><span class=3DSpellE><span style=3D'mso-no-proof:yes'>pet_name</span></span><span style=3D'mso-no-proof:yes'>&gt;

and replace them with corresponding items like:

1) &lt;year&gt;
2) </span><span style=3D'font-size:9.0pt;mso-bidi-font-family:Arial'>&lt;manufacturer&gt;
3) &lt;model&gt;
4) <span class=3DSpellE></span>&lt;serial_number&gt;
5) <span class=3DSpellE></span>&lt;accessories_value&gt;
6) <span class=3DSpellE></span>&lt;accessories_list&gt;
7) <span class=3DSpellE></span>&lt;worldwide_yn&gt;
8) </b><span class=3DSpellE><span style=3D'mso-no-proof:yes'></span></span><span style=3D'mso-no-proof:yes'>&lt;pet_name&gt;

So basically i want that between &lt ; and &gt ; every tag except MARKER_TEXT gets removed and come before &lt ; and i am doing it using c# Regex methods.

Can you Please suggest Proper Regular Expression to achieve it?

Final sample Result should look like:

;&nbsp; </span>&lt;year&gt;<o:p></o:p>
</span></span><span style=3D'font-size:9.0pt;mso-bidi-font-family:Arial'>&lt;manufacturer&gt;</span><span     style=3D'mso-bidi-font-family:Arial'>
 </span>&lt;model&gt;<o:p>
 </span><span class=3DSpellE></span>&lt;serial_number&gt;<o:p>
 </span><span class=3DSpellE></span>&lt;accessories_value&gt;<o:p></o:p></span>
  </span><span class=3DSpellE></span>&lt;accessories_list&gt;
 p;&nbsp; </span><span class=3DSpellE></span>&lt;worldwide_yn&gt;
</b><span class=3DSpellE><span style=3D'mso-no-proof:yes'></span></span><span style=3D'mso-no-  proof:yes'>&lt;pet_name&gt;
mohits00691
  • 137
  • 1
  • 11

1 Answers1

1

This search/replace is probably what you are looking for:

pattern:

&lt;((?:</?span[^>]*>)*)(\w+)((?:</?span[^>]*>)*)&gt;

replacement:

$1&lt;$2&gt;$3

online demo(see the "Context tab")

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Thanks @Casimir et Hippolyte – mohits00691 Nov 11 '14 at 12:34
  • @rene i understand it ... but it is true :-( – mohits00691 Nov 12 '14 at 07:58
  • 3
    Your edit renders this answer obsolete. Don't become a [help vampire](http://meta.stackexchange.com/questions/19665/the-help-vampire-problem) and don't try to parse html with [regex](http://stackoverflow.com/a/1732454/578411). Consider if a new question is more appropiate where you use this answer as a starting point that shows your own effort. – rene Nov 12 '14 at 08:02