Looking for Regular Expressions in PHP

Question

I'm using preg_match function in PHP in order to extract some values from a RSS Feed. Inside this feed content there is something like this:

<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>

I need to get those "A text with non alphanumeric characters" and "more text with non alphanumeric characters" to save them in a database. I don't know if using regular expressions is the best way to do it.

Thank you so much.

What's the reason for stripping out those chars? And what chars are they? — MatCarey, Jun 11 '12 at 12:01
The best way to do this would be to use a PHP RSS parser and not use regex - some guidance: http://stackoverflow.com/questions/250679/best-way-to-parse-rss-atom-feeds-with-php — Matthew Riches, Jun 11 '12 at 12:01

score 1 · Accepted Answer · answered Jun 11 '12 at 12:03

1

If you want to use regex (i.e. quick and dirty, not really too maintainable), this will give you the text:

$input = '<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>';

// Match between tags
preg_match("#</strong>(.*?)</li>#", $input, $matches);
// Remove the text inside brackets
echo trim(preg_replace("#\s*\(.*?\)\s*#", '', $matches[1]));

Though, nested brackets may fail.

answered Jun 11 '12 at 12:03

Jay

3,285
1
20
19

I don't have enough reputation to comment on other answers, but beware that buckley's won't work (as they have said, but it might not clear), if it doesn't have exactly one comma. – Jay Jun 11 '12 at 12:05

score 0 · Answer 2 · answered Jun 11 '12 at 12:04

Given that the structure is always the same you can use this regex

</strong>([^,]*),([^<]*)</li>

group 1 will have the first fragment, group 2 the other

Once you start parsing html/xml with regexes it becomes quickly apparent that a full blown parser is better suited. For small or throwaway solution you a regex can be useful.

score 0 · Answer 3 · answered Jun 11 '12 at 12:06

$str = '<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>';
$str = preg_replace('~^.*?</strong>~', '', $str); // Remove leading markup
$str = preg_replace('~</li>$~', '', $str); // Remove trailing markup
$str = preg_replace('~\([^)]++\)~', '', $str); // Remove text within parentheses
$str = trim($str); // Clean up whitespace
$arr = preg_split('~\s*,\s*~', $str); // Split on the comma

Looking for Regular Expressions in PHP

3 Answers3