0

searching for i wanted to extract a paragraph from my website . their are more then 20 paragraph tags used in the index page. the key diff. is style18 class is used 1 time and style 19 3 time in each tag. i want to search them with the content os class 18 eg. the main content


<p class="margin">
    <span class="style18">*the main content*</span>
      » <a href="https://example1.html">
        somthing</a>

        <span class="style19">[somthing]</span>
         » <a href="https://example1.html">Town</a>

         <span class="style19">[somthing]</span>
          » <a href="https://example1.html">somthing</a>

    <span class="style19">[somthing]</span> »
    <a href="https://www.example.html">somthing</a>

    <span class="style19">[somthing]</span>

</p>

<?php
  $data = file_get_contents('https://www.example.net/index.php');

  preg_match('/<title>([^<]+)<\/title>/i', $data, $matches);
  $title = $matches[1];

  echo preg_match('/(<p)\s.+\n.+(style18).+Single\sTrack(.+)\n(.+)\n(.+)\n(.+)\n.+(style19).+\n(.+)\n(.+)\n.+(style19).+\n(.+)\n(.+)\n.+(style19).+\n(.+)\n(.+)\n.+(style19).+\n\n<\/p>/i', $data, $matches);

  $img = $matches[1];

  echo $title."<br>\n";
  echo $img;
  ?>
Aerro
  • 1
  • 1
  • 2
    Don't use regexs as parsers. `.` doesn't include new lines without the `s` modifier. Don't `echo` the `preg_match` function. It won't be useful because `returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.` – user3783243 Nov 27 '18 at 16:44
  • 1
    Possible duplicate of [How do you parse and process HTML/XML in PHP?](https://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php) – user3783243 Nov 27 '18 at 16:45
  • 1
    https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – AbraCadaver Nov 27 '18 at 16:51

1 Answers1

0

Welcome to the community @Aerro.

If I got your question correctly, you want to extract the inner content of any span surrounded by other spans with certain rules. While this could easily break your fingers with regexp, (tree / graph) query languages like XPath would be a good approach to solve this.

Have a look at e.g. http://php.net/manual/en/simplexmlelement.xpath.php

pocketrocket
  • 365
  • 2
  • 8