1

I have the following html text and in javascript i need to caputure all the tags "p" that have a class "page-break" and then replace it for any text.

I need use regular expression beacuse this html text is going to be processed like a text not like a DOM estrucutre

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Praesent pellentesque tincidunt adipiscing</p>

<p class="page-break">break</p>

<p>Suspendisse a velit at diam facilisis
egestas sit amet a lectus.</p>

<p class="page-break">other</p>

<p>Donec tristique placerat massa vitae hendrerit. Maecenas nec
massa adipiscing sem venenatis vehicula. Suspendisse mattis pretium
libero quis dignissim. Nulla volutpat imperdiet vehicula. Donec ut
tristique neque.</p>

What prevent me to use a dom parser is than i plan to insert a not valid html element i plan transform the previus HTML into this, i need to parse firt like a text to skip html validation and then paste it like this

 <div class="pag visible">
 <p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    Praesent pellentesque tincidunt adipiscing</p>
 </div>
 <div class="pag">   
    <p>Suspendisse a velit at diam facilisis
    egestas sit amet a lectus.</p>
 </div>
 <div class="pag">   
    <p>Donec tristique placerat massa vitae hendrerit. Maecenas nec
    massa adipiscing sem venenatis vehicula. Suspendisse mattis pretium
    libero quis dignissim. Nulla volutpat imperdiet vehicula. Donec ut
    tristique neque.</p>
 </div>

as you can see every ".page-break" is replace ir

eli.rodriguez
  • 480
  • 2
  • 9
  • 22

4 Answers4

4

Don't use regexp to parse HTML. Use DOM instead. If you have plain string, create a DocumentFragment and assign it to its .innerHTML to get DOM.

Find your p tags with getElementsByTagName, check their .className and act accordingly.

Community
  • 1
  • 1
Oleg V. Volkov
  • 21,719
  • 4
  • 44
  • 68
1
// your content
var content = '<p>Lorem ips...';
// to match any <p> with correspondent class
var regex = /(<p class.?=.?"page-break">.*<\/p>)+/g;
// to replace it with whatever you need:
content.replace(regex, "<p>MY TEXT HERE</p>");

Example

ted
  • 5,219
  • 7
  • 36
  • 63
0

Have you thought of using JQuery?

$('p').hasClass('page-break').html('replacement value goes here');

this will replace the contents of <p> with "replacement value goes here"

or $('p').hasClass('page-break').remove(); will remove the <p> element entirely.

Rich Andrews
  • 4,168
  • 3
  • 35
  • 48
0

It is not advisable to parse HTML with regex. You can use XPath for fetching all the <p> with specified criteria and iterate over the returned list and update the textContent for each <p> as shown in below snippet.

var pList = document.evaluate("//p[@class='page-break']", document, null, XPathResult.ANY_TYPE, null);   
var item = pList.iterateNext();  
while (item) {  
    item.textContent = "New Text";
    item = pList.iterateNext();  
}

Explanation

//p[@class='page-break'] will fetch all the <p> elements with class='page-break'. document.evaluate function will return you object of type XPathResult. Using interateNext() function you can get its element. You can set new text using textContent property.

Vaman Kulkarni
  • 3,411
  • 2
  • 21
  • 22