-1

I have the following pattern:

<tbody>
 <div id="aaa">Music</div>
 Ggfdlkjgfds f$5 j3k 
 <div title="Song title #1"></div>
 Fdjflkdsjfds
 <div title="Song title #2"></div>
</tbody>

And I have to extract "Song title #1" and "Song title #2" from this string.

By far I wrote something like this:

(Music)(.*?)(title=\")(.*?)(\")(<\/tbody>)

But it doesn't work. How can I do that?

Thanks!

EDIT. This is not HTML, but the part of the source code, loaded from facebook user's page. There can be basically anything between those lines, so I'm looking only for three keywords:

Music
title="
</tbody>

And wanna find all matches after the middle one.

khernik
  • 2,059
  • 2
  • 26
  • 51

2 Answers2

2

Yet another answer :-P

Edit: Updated due to new info in question.

$str = <<<EOS
<tbody>
 <div id="aaa">Music</div>
 Ggfdlkjgfds f$5 j3k
 <div title="Song title #1"></div>
 Fdjflkdsjfds
 <div title="Song title #2"></div>
 Foobarbaz
 <div title="Song title #3"></div>
</tbody>
EOS;

// First find string between "Music" and "</tbody>"
if (preg_match('#\bMusic\b(.*?)</tbody>#s', $str, $r)) {
    // Then get all song titles
    preg_match_all('#.*?(?:title="(.*?)")#s', $r[1], $r);
    print_r($r[1]);
}

Output:

Array
(
    [0] => Song title #1
    [1] => Song title #2
    [2] => Song title #3
)
mhall
  • 3,671
  • 3
  • 23
  • 35
0

Don't use regular expressions to parse HTML, HTML is not a regular language. Use other tools like http://simplehtmldom.sourceforge.net/.

Useful post here on SO:

Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms

Community
  • 1
  • 1
Jorick Spitzen
  • 1,559
  • 1
  • 13
  • 25
  • 1
    good thing he's not parsing html then, he just wants to rip a value out of a chunk of text. – castis May 07 '15 at 22:37
  • As an aside, using a regex is probably not good way here, but not for theorical reasons (read carefully the comments under the question you linked). The fact that HTML is not a regular language is a false argument. The main problem is that there is no real reasons to use a text approach when you have a structured language under the eyes and when the language used (php) has build-in implementations of libxml. About simplehtmldom, I think that this lib is useless, slow and not so simple (I suggest you to take a look in the code). – Casimir et Hippolyte May 08 '15 at 00:25