1

Hello i'm trying to build a custom script in php that fetches the og:image property in an array and then printout the specific result. I've used the following code

<?php
$_URL = $_GET['url']; //getting the url from THE url value
function getSiteOG( $url, $specificTags=0 ){
    $doc = new DOMDocument();
    @$doc->loadHTML(file_get_contents($url));
    $res['title'] = $doc->getElementsByTagName('title')->item(0)->nodeValue;
    foreach ($doc->getElementsByTagName('meta') as $m){
        $tag = $m->getAttribute('name') ?: $m->getAttribute('property');
        if(in_array($tag,['description','keywords']) || strpos($tag,'og:')===0) $res[str_replace('og:','',$tag)] = $m->getAttribute('content');
    }
    return $specificTags? array_intersect_key( $res, array_flip($specificTags) ) : $res;
}
$_ARRAY = getSiteOG("$_URL");
echo $_ARRAY['image'];
?>

and when used with the following syntax e.g. on the our site

tags.php?url=http://www.stackoverflow.com

it prints out the following result

https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon@2.png?v=73d79a89bded

Which is acceptable.

The script is being run on a batch file using the following method

@echo off
PowerShell -Command "(new-object net.webclient).DownloadString('http://yoursite.com/tags.php?url=https://www.banggood.com/TKEXUN-M2-Flip-Phone-2800mAh-3_0-inch-Touch-Screen-Blutooth-FM-Dual-Sim-Card-Flip-Feature-Phone-p-1367504.html')"
PowerShell -Command "(new-object net.webclient).DownloadString('http://yoursite.com/tags.php?url=https://www.banggood.com/Xiaomi-Mi-9T-Pro-Global-Version-6_39-inch-48MP-Triple-Camera-NFC-4000mAh-6GB-64GB-Snapdragon-855-Octa-core-4G-Smartphone-p-1547570.html?ID=564486&cur_warehouse=HK')"
PowerShell -Command "(new-object net.webclient).DownloadString('http://yoursite.com/tags.php?url=https://www.banggood.com/OnePlus-7-6_41-Inch-FHD-AMOLED-Waterdrop-Display-60Hz-NFC-3700mAh-48MP-Rear-Camera-8GB-256GB-UFS-3_0-Snapdragon-855-Octa-Core-4G-Smartphone-p-1499559.html?ID=62208216150349&cur_warehouse=HK')"

That in return prints out on the screen the resulting links or when pipe'd on a file to a file, screenshot it also works with list of urls on a file on another batch script, but it doesn't matter now

The problem i'm experiencing is

When i try to fetch the og:image links of links like from gearbest website for example this one

https://www.gearbest.com/headsets/pp_009839056462.html

I get no results!!!

I've run simple commands like wget -qO- url or curl -I url for headers and the result is that it has something to do with how my php was compiled, or even curls, on the SSL side. I've read here that some sites need newer secure ssl etc.

To be noted i've also tried masquerading the wget request by changing user agent and other cookie related values on the fly, but still with no success.

I'm on a shared hosting with shell access on a jailed shell but with many binary tools, sed/awk/wget/curl etc and the host site is quite helpful in helping me resolve my problems by adding binaries i may need, but still i don't know how to proceed.

Any help is greatly appreciated

Compo
  • 36,585
  • 5
  • 27
  • 39
SomniusX
  • 83
  • 6

1 Answers1

2

You're probably blocked due to your user-agent. I tried a curl to gearbest as well, and got a 403 permission denied error. Akamai seems to be blocking this user-agent.

But when I used something like curl -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (K HTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" URL it worked fine.

  • i've tried running it on a script inside my host shell with the following as example curl -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (K HTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36" -O "https://www.gearbest.com/headsets/pp_009839056462.html" with no success the curl doesn't give results. Still curl headers may be the way but i need to impletent it on my script above that prints out the array and from that keeps only the image, so i may need to change the way php works in my example, and instead of file_get_contents load a curl pipe. – SomniusX Sep 22 '19 at 19:44
  • maybe it worked for you @George cause your current setup has also the latest curl build with latest SSL. The curl i'm currently using and get no results is curl 7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.27.1 zlib/1.2.3 libidn/1.18 libssh2/1.4.2 Protocols: tftp ftp telnet dict ldap ldaps http file https ftps scp sftp Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz – SomniusX Sep 22 '19 at 19:48
  • I've just tested it using the latest curl binary for windows, 7.66.0, that i downloaded and made the script into a batch file.. and it works, it displays the result on the screen as expected of the masquerading. I will try to run this version of the binary on my host shell and report back!! – SomniusX Sep 22 '19 at 19:57