4

I have a website containing a large DB of products and prices.
I am being constantly cURLed for prices.

I thought of preventing it with a <noscript> tag but all I can do with this is hide the content, bots would still be able to scrape my content.

Is there a way of running a JS test to see if js is disabled (to detect bots) and redirect these requests, maybe in a blacklist.

Will doing so block google from going through my website?

Nir Tzezana
  • 2,275
  • 3
  • 33
  • 56

3 Answers3

1

Since CURL is just an html request your server can't differentiate unless you limit certain urls' access or check for referrer url's and implement a filter for anything not referred locally. An example of how to build a check can be found here:

Checking the referrer

Community
  • 1
  • 1
Silvertiger
  • 1,680
  • 2
  • 19
  • 32
  • I can use any referer I want when sending a request. It's simply another header – PeeHaa Jun 08 '14 at 10:35
  • 1
    I didn't say it would be impossible to spoof, I said it was a viable option and one of the few if not only ways to filter an incoming http request. Not sure why people down vote something that is valid and helpful advice. Instead of down voting, why not post a better solution. – Silvertiger Jun 08 '14 at 10:55
  • Well the answer is just not correct. The correct answer is: you can not. – PeeHaa Jun 08 '14 at 13:07
  • There *is* a way to do it, this just isn't it. – pguardiario Jun 09 '14 at 00:23
1

You can block unspoofed cURL requests in php by checking the User Agent. As far as I know none of the search engine crawlers have curl in their user user agent string, so this shouldn't block them.

if(stripos($_SERVER['HTTP_USER_AGENT'],'curl') !== false) {
    http_response_code(403); //FORBIDDEN
    exit;
}

Note that changing the User Agent string of a cURL request is trivial, so someone could easily bypass this.

FuzzyTree
  • 32,014
  • 3
  • 54
  • 85
1

You would need to create a block list and block the ips from accessing the content, all headers including referrer and user agent can be set in curl very easily with the simple following code

$agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)';
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL, 'http://www.yoursite.com?data=anydata');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.yoursite.com');
$html = curl_exec($ch);

the above will make the curl request look like a normal connection from a browser using firefox.

romx
  • 11
  • 1