0

I want to search and replace html tag p and /p with div and /div inside blockquote only. the example is as follows :

<blockquote>
    <p>paragraph 1</p>
</blockquote>
<p>paragraph 1 outside blockquote</p>
<blockquote>
    <p>paragraph 2</p>
    <p>paragraph 3</p>
</blockquote>
<p>paragraph 2 outside blockquote</p>

the search regex is :

(<blockquote>)(.*?)(p>)(.*?)(</blockquote>)

and the replace regex is :

\1\2div>\4

The problem is the p tag outside blockquote will be changed too after repeating "replace all" command. The above regex can only search and replace one instance, I have to execute the "replace all" command continually until all p are replaced. Is there any way to repeat the regex automatically? (I use Editpad Pro v.7.2.3)

xpw
  • 87
  • 1
  • 2
  • 13

2 Answers2

1

This is a FAQ in many quarters. regex is good for many things, and parsing balanced delimiters is not one of them.

You need to read up about Document Object Model, and XPath. Then load your HTML into a DOM, find its nodes with XPath, operate on them, then write them back.

Phlip
  • 5,253
  • 5
  • 32
  • 48
  • 1
    And of course, there's the obligatory mention of http://stackoverflow.com/a/1732454/1072112 . :-) – ghoti Sep 22 '15 at 03:13
  • Thanks for your advice, but I hope I can solve it with your help to see what is wrong with the regex. – xpw Sep 22 '15 at 03:18
  • thanks, fish, for the 4454 bump cite why Regexp sucks six-ways to Sunday for HTML. I can think of a zillion workarounds, and have lightly regexped light HTML, under complete control, before. And modern XPaths are a couple hacks from Regexps built in. And, xpw, `>)(.*?)(p>` which > is the right one for the first `
    ` you expect to hit?
    – Phlip Sep 22 '15 at 05:19
  • After the first blockquote has been changed, then "paragraph 1 outside blockquote" below it will be changed too, that is the problem. – xpw Sep 22 '15 at 05:36
  • There is a Regexp feature that stops the munching after one replacement – Phlip Sep 22 '15 at 05:47
1

Search:

(<blockquote>(?:(?!</?blockquote).)*?)<p>(.*?)</p>((?:(?!</?blockquote).)*</blockquote>)

Replace with:

\1<div>\2</div>\3

DEMO


An alternative would be to replace one tag at a time, reducing the ammount of times you should replace all occurrences. However, I don't know if this will work in EditPad.

Find:

<p>((?:(?!</?blockquote).)*?)</p>(?=(?:(?!</?blockquote).)*</blockquote>)

Replace with:

<div>\1</div>

DEMO

Mariano
  • 6,423
  • 4
  • 31
  • 47
  • 1
    the main problem with that is it will appear to work, for a long time – Phlip Sep 22 '15 at 05:46
  • @Phlip what do you mean? btw, I agree with what you answered, but for the scope of a text editor, it may be easier to use a regex even if it may fail – Mariano Sep 22 '15 at 05:51
  • To Mariano, I wonder if there has any way to repeat the regex automatically? – xpw Sep 22 '15 at 06:09
  • 1
    The alternative works in EditPad. It is just one click, the former answer needs two. I tried to add more blockquotes, only one click needed! – xpw Sep 22 '15 at 06:30