-1

I'm using jquery and use a php file to display dynamic contents to different URLs. The php file takes the current url (referal) and it queries for respective content in the database:

$url =  $_SERVER['HTTP_REFERER'];

Here is the jquery:

<div id="dyncontent"></div>
<script type="text/javascript">
     $(function() {
         $.get('http://mydomain.com/content.php', function(data) {
             $('#dyncontent').html(data);
         });
     });
</script>

I know bots can fake url referal, but i'm not sure if it can get the content of my php file.

Anyone has experience with this? How to effectively hide that piece of content from bots?

Thank you very much.

aye
  • 301
  • 2
  • 16
  • Why not just write a `robots.txt` file? – John Dvorak Dec 10 '12 at 11:13
  • Its probably a bot with a negative purpose which will just ignore it. – Hans Dec 10 '12 at 11:14
  • Depends what kind of bots he is trying to hide it from. If its security reasons a malicious bot wouldn't listen to robots.txt :) – cowls Dec 10 '12 at 11:14
  • Most bots to my site are bad bots with various IP address range. The content in the jquery is important so i dont want them to crawl it. Other content i'm ok to be scraped – aye Dec 10 '12 at 11:18
  • You might want to read this answer: https://stackoverflow.com/questions/396817/protection-from-screen-scraping Essentially there are different considerations based on your scenario. But it is very difficult to expose your data in that manner and keep it protected from bots who want to scrape it. – cowls Dec 10 '12 at 11:18

1 Answers1

0

This is a good method for dealing with malicious bots: Protect Your Site with a Blackhole for Bad Bots

The basic premise is (quoted from the website):

...include a hidden link to a robots.txt-forbidden directory somewhere on your pages. Bots that ignore or disobey your robots rules will crawl the link and fall into the trap, which then performs a WHOIS Lookup and records the event in the blackhole data file. Once added to the blacklist data file, bad bots immediately are denied access to your site....

If a bot isn't obeying your robots.txt file, you probably don't want it on your site.

Stu
  • 4,160
  • 24
  • 43
  • If the bots were intended to extract a particular data, i think it won't bother to crawl other links – aye Dec 10 '12 at 11:33