PHP Regexp Help

Question

I'm going to be working with regular expression's a lot in a new project, I don't have much experience with them and was wondering of a good way of converting HTML to a regular expression.

Anybody know of any good tutorials, or perhaps a generator?

At the moment I need to convert this:

<span class="code" id="code" title="DOESNT MATTER">IMPORTANT<img class="scissors" src="DOESNT MATTER" alt="DOESNT MATTER" /></span>

Thanks!

I am I'm just modifying it to make it suite my needs, so I thought maybe this would be a good way. — , Jul 15 '11 at 14:03

genesis · Answer 1 · 2011-07-15T14:08:26.827

$text = '<span class="code" id="code" title="DOESNT MATTER">IMPORTANT<img class="scissors" src="DOESNT MATTER" alt="DOESNT MATTER" /></span>';
preg_match('|<span class="code" id="code" title="DOESNT MATTER">IMPORTANT<img class="scissors" src="DOESNT MATTER" alt="DOESNT MATTER" /></span>|', $text, $match);

there's nothing to be "converted" if you're not looking for specified title for example

to pick that important you would use

$text = '<span class="code" id="code" title="DOESNT MATTER">IMPORTANT<img class="scissors" src="DOESNT MATTER" alt="DOESNT MATTER" /></span>';
preg_match('|<span class="code" id="code" title="DOESNT MATTER">(.*?)<img class="scissors" src="DOESNT MATTER" alt="DOESNT MATTER" /></span>|', $text, $match);
echo $match[1]; //IMPORTANT

How would I go about picking up the "IMPORTANT"? – Jul 15 '11 at 14:03 — , Jul 15 '11 at 14:03

Jonathan Kuhn · Accepted Answer · 2011-07-15T14:24:52.587

0

If you want to just get rid of all the html around some values, you can just use strip_tags()

Edit: moved the comment into the answer because it was copy/pasting out bad.

<?php
$html = '<span class="code" id="code" title="DOESNT MATTER">IMPORTANT<img class="scissors" src="DOESNT MATTER" alt="DOESNT MATTER" /></span>';
preg_match_all("/<span\s.*?class=\"code\"[^>]+>(.*?)<img\s.*?class=\"scissors\"[^>]+>/i", $html, $matches);
var_dump($matches);
?>

Also, please note that just like said in the comments above, using a regex to parse html is considered bad practice. You should be able to load the html into an instance of DOMDocument and use the getElementsByTagName method to get all spans. Then you can loop through those and validate the attributes/text inside.

edited Jul 15 '11 at 14:24

answered Jul 15 '11 at 14:07

Jonathan Kuhn

15,279
3
32
43

Yes, but I have a big file of HTML and I'm looking for multiple of the basically same lines. I need the IMPORTANT and nothing else from each. – Jul 15 '11 at 14:09
so what exactly are you looking for? if you just wanted to match 'IMPORTANT' then `/IMPORTANT/` would do it. What exactly makes 'IMPORTANT' important? Is it like a span tag with the class code followed by some text that you want to capture followed by an image tag with the class scissors? `preg_match_all("/]+>(.*?)]+>/i", $html, $matches);var_dump($matches);` – Jonathan Kuhn Jul 15 '11 at 14:15
Didn't seem to find anything? :/ – Jul 15 '11 at 14:19
.. and yes I'm looking to output the text in the span tag. – Jul 15 '11 at 14:20
This is what I'm using at the moment: `code`preg_match_all("/]+>(.*?)]+>/i", $printable, $matches); foreach($matches as $match) { echo("$match[1]
"); }`code` – Jul 15 '11 at 14:22
It didn't copy paste well. I moved it into the answer. – Jonathan Kuhn Jul 15 '11 at 14:23
Seems to be getting there, the only problem is that it's not picking up on all the similar span's in the html. – Jul 15 '11 at 14:29
It's perfect I just had to add an "PREG_SET_ORDER". Thank you so much! :) – Jul 15 '11 at 14:33

score 0 · Answer 3 · edited May 23 '17 at 12:33

0

It's worth noting that Regular Expressions are not a great solution for parsing HTML. I think they are fine if you have a small chunk of HTML with a guaranteed format, though.

Please see the following great StackOverflow thread:

RegEx match open tags except XHTML self-contained tags

edited May 23 '17 at 12:33

Community

1
1

answered Jul 15 '11 at 14:23

Charles Burns

10,310
7
64
81

PHP Regexp Help

3 Answers3