1

I am using Simple Html Dom for parsing through Html.In this I wasn't able to load <p> tag if it is in nested manner

<p>Hello there <p>Some Content </p>outer content <p>Some More content</p></p>

I don't know how to replace the inner <p></p> tags using Regex .

My expected output is :

<p>Hello there Some content outer content Some More content</p>

Someone please help me in getting this done

Angu
  • 862
  • 2
  • 13
  • 32

3 Answers3

1

Assuming that your whole problematic <p> tag is in a single line, you can use the following regex

((?!^)<p>)|(<\/p>(?!$))

(?!^)<p>) matches all <p> tags excluding the <p> in the beginning of the string

(<\/p>(?!$) matches all </p> tags excluding the </p> in the end of the string

you can just replace these captured <p> and </p>s with null and remove them.

Here is a working demo

EDIT:

Since your input is a html file you can try this updated regex

(<p>)((?!<\/p>).)*?(<p>).*?(<\/p>)

(<p>) searches for <p> tag

((?!<\/p>).)*?(<p>) captures <p> tag inside the first <p> tag without any </p> tag in between (nested <p> tag)

.*?(<\/p>) captures the closing tag of the nested <p> .

just remove the capture groups 3 and 4 and you have removed the nested

tag. You need to run this again and again till there are no more matches.

you can find the updated regex demo here

UPDATE:

Use this regex (.*<p>)(((?!<\/p>).)*?)(<p>)(.*?)(<\/p>)(.*)

and replace it with \1\2\5\7 which will remove the nested tags alone.

Demo here

Abdul Hameed
  • 1,025
  • 12
  • 27
  • can u see this https://regex101.com/r/ycwo96/2 ? Actually the regular expression removes the content inside the table content too but my requirement is nested paragraph tag should get removed only if is in nested manner . – Angu Feb 21 '17 at 12:42
  • ah.. can you make that `

    ` in a separate line and run the regex i gave? like you can first replace the first occurrence of `

    ` with `\n

    `.

    – Abdul Hameed Feb 21 '17 at 12:44
  • No it wont, It will remove the paragraph tag only if it nested . Actually I am providing the entire html as a input string to Simple Html Dom and if we find "p" tag and incase if the p tag is a nested one then I am not able to get the entire tags . Nested p tags are creating as separate chunks – Angu Feb 21 '17 at 13:07
  • mate the updated Regex is removing some content if the

    is nested . how to solve that out .

    – Angu Feb 21 '17 at 13:58
  • what content is it removing? the capture groups 3 & 4 capture only the tags and nothing else. as long as you replace only those capture groups it should not remove anything else. – Abdul Hameed Feb 21 '17 at 14:01
  • in the updated regex you can find "

    Hello there

    Some Content

    " is getting removed . can you please explain me how to check that coz i am really new to regex . Can you please update the changes in the working regex
    – Angu Feb 21 '17 at 14:05
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/136257/discussion-between-angu-and-abdul-hameed). – Angu Feb 21 '17 at 14:31
0

Please try this function for removing <p></p> tags

<?php function remove_p($input) {
    $input=str_ireplace('<p>','',$input);
    $input=str_ireplace('</p>','',$input);    
    return "<p>".$input."</p>";  
} 
?>

Please see how to use this function:

<?php $val = "<p>Hello there <p>Some Content </p>outer content <p>Some More content</p></p>";
echo remove_p($val);
?>

Hope, it may be helpful to you.

Prateek Verma
  • 869
  • 1
  • 6
  • 9
  • Please check my updated code, now, is it ok? or you are trying to say something else. – Prateek Verma Feb 21 '17 at 12:57
  • it'll work for the input given in the question. but will not work for the input given by OP in the comments of my answer.. The input is a html file sadly.. with all `html` tags not just the `p` tags – Abdul Hameed Feb 21 '17 at 13:06
-1

Nested p tags are not allowed. In place of that you can use:

<p>Hello there <span>Some Content </span>outer content</p>

See the below link for more details

Nesting <p> won't work while nesting <div> will?

Community
  • 1
  • 1
munmun poddar
  • 107
  • 10