REGEX for HTML in php

Question

I have an html file... this file has the formula:

<body>
<p class="Title-P">Compiler</p>
<p class="Heading1-P">kdnkls:</p>
<p class="Normal-P">dsf</p>
<p class="ListParagraph-P">kjsksf</p>
<p class="ListParagraph-P">dsfsf</p>
<p class="ListParagraph-P">sfsfsf</p>
<p class="Heading2-P">fsfs:</p>
</body>

what is the suitable regex to replace the tags:

<p class="Title-P>foo</p> with <h1>foo</h1>

<p class="Heading1-P">kdnkls:</p> with <h2> kdnkls: </h2>
<p class="Normal-P>foo</p> with <p> foo </p>
etc...

I'm using preg_replace function in php which takes as arguments: pattern and replacement...

Welcome to Stack Overflow! Please refrain from parsing HTML with RegEx as it will [drive you į̷̷͚̤̤̖̱̦͍͗̒̈̅̄̎n̨͖͓̹͍͎͔͈̝̲͐ͪ͛̃̄͛ṣ̷̵̞̦ͤ̅̉̋ͪ͑͛ͥ͜a̷̘͖̮͔͎͛̇̏̒͆̆͘n͇͔̤̼͙̩͖̭ͤ͋̉͌͟eͥ͒͆ͧͨ̽͞҉̹͍̳̻͢](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use an [HTML parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) instead. — Madara's Ghost, Aug 11 '12 at 00:47
lol @Truth !!! i'm sure you just copied-pasted the comment... — CdB, Aug 11 '12 at 00:48
@CrisDeBlonde: I'm using [this](http://stackapps.com/q/2116) chrome plugin. — Madara's Ghost, Aug 11 '12 at 00:53

score 3 · Accepted Answer · answered Aug 11 '12 at 00:50

3

Try:

$html = preg_replace('/<p class="Title-P">(.*?)<\/p>/i', "<h1>$1</h1>", $html);
$html = preg_replace('/<p class="Normal-P">(.*?)<\/p>/i', "<p>$1</h1>", $html);

That should work, better bet is to parse the document using DOM and make your changes and then save out the document.

answered Aug 11 '12 at 00:50

drew010

68,777
11
134
162

You are aware I can break that regex in approx 0.8 seconds, right? – Madara's Ghost Aug 11 '12 at 00:53
I'm aware of many things. His HTML file has a specific format which that matches... – drew010 Aug 11 '12 at 00:56
1

@user1576848 DOM is a true HTML/XHTML parser and can parse the whole HTML document into an object and you can easily access certain tags and search nodes within the document. While regex can be used to match certain patterns that may be HTML, it isn't well suited for more advanced parsing of HTML because matching the correct closing tags can be difficult or overly complicated. For any serious manipulation or access to nodes within an (X)HTML document, DOM is the way to go in PHP. – drew010 Aug 13 '12 at 04:05
but in this case (HTML file with specific format)... will it affect performance and speed to use regex? or it's better to use DOM? knowing that regex is working fine with my case... – Ibrahim.I Aug 13 '12 at 05:21
1

@user1576848 My view will differ from a lot of SO, as you can see from comments, people go nuts over seeing regex to do anything with HTML. I think regex is fine for certain HTML matching or replacing *IF YOU UNDERSTAND* that any minor change to the HTML format can render your regex matchless, creating overly complex regexps for HTML is bad practice, people will look at it later (even yourself) and you won't easily understand what the regex does and will spend a lot of time examining it (especially *when* it breaks). THAT SAID: regex can be faster (when written efficiently) than using DOM – drew010 Aug 13 '12 at 05:26
1

...since DOM has to parse the entire document structure into memory. With very large documents, DOM can use too much memory, and a well written regex can consume much less memory to parse certain content. So if you have a VERY SPECIFIC HTML format that can be easily matched with a simple to understand regex, I say fine go for it, it was made to match patterns. If you want to do something like "find all `` tags that have an `onclick` attribute", or "parse all `` tags within an `
` tag" then I say go with the DOM for those types of cases. That's my 2 cents :)
– drew010 Aug 13 '12 at 05:28

REGEX for HTML in php

1 Answers1

` tag" then I say go with the DOM for those types of cases. That's my 2 cents :)