-4

I want to convert the python function below to PHP function, if someone could help a little bit I'd appreaciate it:

p.s .: I know that for those who master the process the question may seem simple and repetitive (there are several posts about converting function in the Stack), however, for beginners it is quite complicated.

def resolvertest(url):
    if not 'http://' in url:
        url = 'http://www.exemplo.com'+url
    log(url)
    link = abrir_url(url)
    match=re.compile('<iframe name="Font" ="" src="(.*?)"').findall(link)[0]
    req = urllib2.Request(match)
    req.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36')
    response = urllib2.urlopen(req)
    link=response.read()
    response.close()
    url = re.compile(r'file: "(.+?)"').findall(link)[0]
    return url
  • What does this function do? – SuperDJ Oct 21 '17 at 14:23
  • Is it php or python ? – Niklesh Raut Oct 21 '17 at 14:25
  • @SuperDJ, Follow a link (misleading advertisements) and find the final link. – Antonio Oliveira Oct 21 '17 at 14:28
  • 4
    Sorry, but this is not how this site works, we are not here to do your work for you. You are expected to code yourself. If you run into a specific problem with that, then is the time to come here and ask a question about a specific issue with the code you yourself wrote. – arkascha Oct 21 '17 at 14:29
  • @user2486, It's in python, I need to convert to PHP. – Antonio Oliveira Oct 21 '17 at 14:29
  • @AntonioOliveira PHP isn't capable for scrapping the web like python can – SuperDJ Oct 21 '17 at 14:30
  • @arkascha, Sorry, but I did not mean for anyone to do the job for me. I just wanted to understand the conversion process. But if I am breaking rules, I delete the question. – Antonio Oliveira Oct 21 '17 at 14:55
  • There is no "conversion process", programming is not a cross compile action. You should try to understand all details of the function you look at and then implement it in php. If you run into issues, then ask :-) – arkascha Oct 21 '17 at 14:57

2 Answers2

0

From my limited Python knowledge I'd assume this does the same:

function resolvertest($url) {
    if (strpos($url, 'http://') === FALSE) {
        $url = 'http://www.exemplo.com' . $url;
    }
    echo $url; // or whatever log(url) does
    libxml_use_internal_errors(true);
    $dom = new DOMDocument;
    $dom->loadHTML($url);
    libxml_use_internal_errors(false);
    $xpath = new DOMXPath($dom);
    $match = $xpath->evaluate('//iframe[@name="Font"]/@src')->item(0)->nodeValue;
    $ua = stream_context_create(['http' => ['user_agent' => 'blah']]);
    $link = file_get_contents($match, false, $ua);
    preg_match('~file: "(.+?)~', $link, $matches);
    return $matches[1];
}

Note that I didn't use a Regular Expression to get the iframe src, but actually parsed the HTML and used XPath. Getting the final link does use a Regex, because it seems to match some JSON and not HTML. If so, you want to use json_decode instead for more reliable results.

Gordon
  • 312,688
  • 75
  • 539
  • 559
0

I created a function to pass all url calls through the curl getcurl($url), making it easier to read the pages and their contents.

We use a kind of loop that will go through all the sub-links you have on the page, until you get to the final page, when it arrives there, if($link) is no longer called, and your regex file: "(. +?)" is executed, capturing the desired content.

The script is written in a simple way.

$url = "http://www.exemplo.com/content.html";
$file_contents = getcurl($url);
preg_match('/<iframe name="Font" ="" src="(.*?)"/', $file_contents, $match_url);
@$match = $match_url[1];

function get_redirect($link){
    $file_contents = getcurl($link);
    preg_match('/<a href="(.*?)"/', $file_contents, $match_url);
    @$link = $match_url[1];
    if($link){
        return get_redirect($link);
    }else {
        preg_match('/file: "(.+?)"/',$file_contents, $match_content_url);
        @$match_content_url = $match_content_url[1];
        return $match_content_url;
    }
}

function getcurl($url){
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $url = curl_exec($ch);
    curl_close ($ch);
    return $url;
}

$content = get_redirect($match);
echo $content;
Florida
  • 134
  • 4
  • 9