-1

I am not talking about any .htaccess rewrite rules.

I am working on a web app and I want to obscure URLs from bot/scrapers - it should be something like the question asked here.

We need to make the URLs obscured like the approach used by Trip Advisor. I tried many solution including the one posted in the above mentioned questions but it doesn't work for me.

For example we have URL like example.com/file.php?u=jh843 and want to obscure it in a way that make it something like LqMWJQiMnYeVtIJpEJCIQQoqnQQxGEcQQoqnQQeVtIJpEJCIQQoqnQ or example.com/eVtIJpEJCIQQoqnQ - either way it will be good.

Community
  • 1
  • 1
  • Generate random unguessable ids for your data and use that. However, the URL will still be the URL. You may not be able to just increment one number to go to the next page, but you'll still be able to visit the page. – deceze Apr 23 '15 at 11:55
  • 1
    You have a balance to strike between obfuscation and accessibility. You can use JavaScript to make it hard for scrapers to read URLs, but you may also make it hard your site to navigate for people who do not use traditional browsers. – halfer Apr 23 '15 at 12:16
  • Maybe it would just be better to check hits on your site - and if a reader makes too many or too many in a certain period of time, serve 404s to that IP for the following hour. Also, don't forget the Robots Exclusion Protocol. – halfer Apr 23 '15 at 12:19

1 Answers1

0

The Tripadvisor solution looks like on click, javscript decodes the string and then loads the url (i don't see they use it in their website).

But modern bots can execute javascript.

One solution is assign a randoom string to url in your server (like a simple $ofuscated = base64_encode($url.$salt) that can be decoded again on server), store it in session or database (so it can be assigned just to a specific user or any access control, like download times, ip block...).

But bots can use sessions, so it's still a link. The only control you can set is if this is a one time download use link or a user session limitted (just for logged or specific users).

And block the directly acces to file url.

maztch
  • 1,597
  • 19
  • 25
  • thanks for help, main purpose is to hide the sensitive files like file.php i our example and preventing bots is just side purpose. – sherif halim Apr 23 '15 at 12:17
  • For "legal" and friendly bots use the rel="no-follow" attribute on link and robots.txt file with disallow url. For others...maybe unusual frequency time access... – maztch Apr 23 '15 at 12:25
  • check teh example here http://pastebin.com/6SLX85AN how can we get the the base64 encoded url to work and show it in source code as endcoded – sherif halim Apr 23 '15 at 12:31
  • you must use .htccess to catch urls to script and process it or send encoded data to a specific url (that will have the original problem). Last option is decode in client side using javscript: http://stackoverflow.com/questions/2820249/base64-encoding-and-decoding-in-client-side-javascript – maztch Apr 23 '15 at 12:43
  • i am confused can you you send a working code based on the pastebin example i sent? – sherif halim Apr 23 '15 at 12:53