0

Hello this is my code:

<?php
require('/simple_html_dom.php');
$html = new simple_html_dom();
$html = file_get_html('proxys.html');

$items = array();
$re = "/<td class=\\\"t_ip\\\">\\s*((?:[0-9]{1,3}\\.){3}[0-9]{1,3})\\s*<\\/td>(?:.*?)*<td class=\"t_port\">(?:.*?)\\w+\\^\\w+\\^([0-9]{1,5})(?:.*?)<td class=\"t_type\">\\s*([0-9])(?:.*?)/"; 

        preg_match_all($re, $html, $matches, PREG_SET_ORDER);
        foreach ($matches as $val) {
        echo nl2br($val[1] . ':' . $val[2] . ' ' . $val[3] . "\n");
        };

?>

proxys.html

<td class="t_ip">104.131.248.140</td><td class="t_port">           <script type="text/javascript">           //<![CDATA[             document.write(BigBlind^BigBlind^60088);           //]]>           </script>50088         </td><td class="t_type">     5         </td><td class="t_ip">79.101.32.14</td><td class="t_port">           <script type="text/javascript">           //<![CDATA[             document.write(Polymorth^Polymorth^1080);           //]]>           </script>45080         </td>

The problem is that the value is obtained "60088" of ****document.write(BigBlind^BigBlind^60088);****

104.131.248.140:    60088 5
79.101.32.14:       1080 4

and would like to get the value of < / script>50088

104.131.248.140:    50088 5
79.101.32.14:       45080 4

I'm lost with regular expression, Thank you for your help

  • 1
    Regex is not the perfect tool for parsing HTML/XML – Narendrasingh Sisodia Nov 21 '15 at 04:44
  • 2
    Possible duplicate of [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – joce Nov 21 '15 at 04:52
  • @Joce The Zalgo post is (too) oftenly referenced as a way to show some of the problems that may arise when parsing XHTML with regex. However I don't think that post really answers this question. Or is it even the asking for the same so as to mark it as duplicate? – Mariano Nov 21 '15 at 06:27
  • @Mariano OP wants to use regex to parse HTML. The Zalgo post addresses this nonsense. Another answer could be "Don't" or "No". – joce Nov 21 '15 at 14:40
  • @Joce I agree that regex should not be used here. However, I don't think "*Don't*" is a valid answer here. I believe the OP is actually asking for a way to extract the text content of an HTML tag, and the regex is his failed attempt, while the question suggested as duplicate is asking for a way to match a tag, not the content. Some of the answers in that post (that is if you dig down) suggest using DOM. However, not a single one of them shows a way to get the text content. [From mSE FAQ](http://meta.stackexchange.com/a/10844/304899): "*questions are duplicates if they have the same answers*" – Mariano Nov 22 '15 at 09:34

1 Answers1

1

You can try using DOMDocument like as

$html = '<td class="t_ip">104.131.248.140</td><td class="t_port">           <script type="text/javascript">           //<![CDATA[             document.write(BigBlind^BigBlind^60088);           //]]>           </script>50088         </td><td class="t_type">     5         </td><td class="t_ip">79.101.32.14</td><td class="t_port">           <script type="text/javascript">           //<![CDATA[             document.write(Polymorth^Polymorth^1080);           //]]>           </script>45080         </td>';

$dom = new DOMDocument;
$dom->loadHTML($html);
$root = $dom->documentElement;
$tds = $root->getElementsByTagName("td");
foreach($tds as $key => $value){
    echo $value->parentNode->textContent."<br>";
}
Narendrasingh Sisodia
  • 21,247
  • 6
  • 47
  • 54