-1

I have to find a particular pattern from html text using regular expression

For example:

my string is

<table border="0" cellspacing="0" cellpadding="0" width="100%"><tbody><tr><td><p align="justify"><u>Counsel appeared</u></p><p align="justify"><a name="COUNSEL" id="COUNSEL"></a>K. P. Garg CA<b>for the Appellant</b>.: A. K. Monga, Sr. DR <b>for theRespondent</b></p><p align="justify"><b><u><a name="JUDGE" id="JUDGE"></a>R. P.TOLANI, JM.</u></b></p><p align="justify">testing</p>..........and so on 

and I want to remove a <p align="justify"><u>Counsel appeared</u></p><p align="justify"><a name="COUNSEL" id="COUNSEL"></a>K. P. Garg CA<b>for the Appellant</b>.: A. K. Monga, Sr. DR <b>for theRespondent</b></p> this portion from html text. The text enclosed in the the html tags is dynamic.

For this I have written following regular expression

gsub(/<p align="justify"><u>counsel appeared<\/u><\/p><p align="justify"><a name="counsel" id="counsel"><\/a>.*<b>.*<\/b><\/p>/i, '')

but its removing the whole text from ``counsel appeared</u></p> till end.

So how I remove a particular portion from the above html string? Can anyone help me in modifying my regular expression ?

Deepti Kakade
  • 3,053
  • 3
  • 19
  • 30
  • 1
    I don't mean to be *that guy*, but maybe you should use an html parser instead of relying on regex. Take a look at this famous SO answer: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454 – gion_13 Feb 28 '15 at 07:10

1 Answers1

0

Try the following pattern:

<p\s+?align\s*?=\s*?"justify">\s*?<u>\s*?counsel appeared\s*?<\/u>\s*?<\/p>\s*?<p\s+?align\s*?=\s*?"justify">\s*?<a\s+?name\s*?=\s*?"counsel"\s+?id\s*?=\s*?"counsel">\s*?<\/a>.*?<b>.*?<\/b>\s*?<\/p>

victor
  • 141
  • 2