Getting substring of a string in an html tag with regex in php?

Question

Possible Duplicate:
regex help with getting tag content in PHP

At first, please no comment about parsing html with regex. I know that it is not possible but it should do its job in this case.

I try to get the content of <country lan="x">...</country> tags. There is no special case like <country /> and the PHP DOM Parser fails due to the content of the tags which contains many special chars (MediaWiki text).

So I have some text like

    <country lan="en">


    dsadasd


    {|,'''""" }}|]][][]//\\\\\2r2erfaf<>><<<#<div> --..,;;"!"§$%&/()=?`´´``***+~~~''

    0131ß

    ÄÜÖ#ax
    </country>

My solution at the moment is $pattern = <country lan=\"en\">(.|\t|\r|\n|\s)*<\/country> which seems to match using

preg_match_all($pattern, $content, $matches);
print_r($matches);

but the printed result is just an empty array. How can I extract only the string between the <country lan="x">...</country> tags?

If I got it right, the OP cannot use DOM parsers because the HTML is invalid. — Álvaro González, Nov 23 '12 at 09:44
If this is too complicated with an regex, why just you don't look for the first string, then for the second string and get the substring between both positions? Especially as start and end are fixed strings. Just saying DOM does not work for you, it's also clear that regex is too complicated for you, too. So just do standard string manipulation instead. — hakre, Nov 23 '12 at 10:00
I think the DOM Parser does not do the trick because there is mixed up content of wiki markup and html between the tags -- so it seems to be invalid. "Standard string manipulation" is quite harder than using regex, because there can be several `...` tags per site. — dnl, Nov 23 '12 at 10:06

score 1 · Accepted Answer · edited Nov 23 '12 at 09:56

1

Use this one

preg_match_all('/<country.*?>(.*?)<\/country>/s', $contents,$hits);
print_r($hits);

edited Nov 23 '12 at 09:56

hakre

193,403
52
435
836

answered Nov 23 '12 at 09:50

Nipun Tyagi

878
9
26

Thank you! After the edit it worked fine. How could I get the `lan="x"` param at the same time? – dnl Nov 23 '12 at 10:03
are you want to get class name of the tag ? – Nipun Tyagi Nov 23 '12 at 10:16

Getting substring of a string in an html tag with regex in php?

1 Answers1