0

I have website and it's source html code looks something like below.

<li class="item" xx-href-xx="http://xx.xx/s/randomtext/randomtext?NOTradnomtext" yy-href-gg="http://xx.xx/X/RANDOMTEXTWHATIWANT/STILLRADNOMTEXTWHATIWANT?NOTradnomtext" data="212123134" data-title="TITLE">
  <a class="front" href="#" xx-href="http://xx.xx/s/randomtext/randomtext?NOTradnomtext">
    <img src="http://photo.jpg" alt="">
    <div class="cock">
        <div class="action"></div>
    </div>
  </a>
  <div class="label">
    <div>
         <h3 class="title">Example</h3>
         <p>2013-10-25 : 03:35</p>
    </div>
 </div>
</li>

... And so on same kind of classes (only titles and texts changing) ...

How to preg_match yy-href-gg="http://xx.xx/X/TEXTWHATIWANT/TEXTWHATIWANT?NOTradnomtext from all of those records and include also title for result. So result should look in this case something like that

  • Example
    TEXTWHATIWANT/TEXTWHATIWANT

  • Example2
    TEXTWHATIWANT/TEXTWHATIWANT

and so on.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
sukkis
  • 312
  • 2
  • 17
  • See [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – showdev Oct 25 '16 at 00:02
  • Okay... Is there any good way to do that then, I can't have any json formatted version of that data. Is only way to do it manually? – sukkis Oct 25 '16 at 00:06
  • 1
    Possible duplicate of [How do you parse and process HTML/XML in PHP?](http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) – chris85 Oct 25 '16 at 00:15

1 Answers1

0

Use character classes and negated character classes to match characters that are or are not what you allow. Use \K to forget any matched characters so that you only get back the desired portion of text as the fullstring match (no capture groups).

I assume that your url path is relatively safe to match non-slash, non-double-quote, and non-question-mark characters between directory slashes.

Code: (Demo)

preg_match_all(
    '# [a-z]{2}-href-[a-z]{2}="https?://[^/"?]+/[^/"?]+/\K[^/"?]+/[^/"?]+#i',
    $html,
    $matches
);
var_export($matches[0]);
mickmackusa
  • 43,625
  • 12
  • 83
  • 136