0

Visitors on my page have the option to save their prefered settings as a cookie (I know some are against it, but this is not the point of this discussion).

If the user does not have a cookie, the user is asked if he/she wants to set up settings and then if yes redirect with javascript.

Can I detect non human trafic and not ask the "question" to them? I have noticed google speed analytics are always beeing redirected to my settingspage which gives me wrong data in the analytics page.

So can I detect the non human trafic, by php or javascript?

EDIT: I would prefer to detect them in php as I have plans to phase out the javascript as much as possible

Andreas
  • 23,610
  • 6
  • 30
  • 62
  • 1
    Use captcha, Generating Captcha using client side scripting is quite a simple but make sure that the javascript is enabled. – Prabhat Sinha Apr 11 '16 at 07:22
  • @Prabhat, thank you for the suggestion. I will look in to it. I have no previous knowledge of Captcha. – Andreas Apr 11 '16 at 07:24
  • All the search engines use specific user agents which can be detected and dealt with in php. Also you can set up a robots.txt excluding your settingspage. – Thomas B Apr 11 '16 at 07:25
  • It's worth noting that the "robots.txt" file does not __prevent__ crawlers from visiting the listed URIs, it merely asks them politely not to do so. As a matter of fact, most illegitimate crawlers do not even look for a robots.txt file, let alone obey it. – kalatabe Apr 11 '16 at 07:51

3 Answers3

7

Use a honeypot - an empty, non-visible (but not hidden) field that bots will likely fill in. Also you can try and catch the click event, since bots like Google are not likely to emulate it crawling your page. Overall your best option though is using your .htaccess file (or robots.txt) to disable crawling of unwanted pages - check this out: Block all bots/crawlers/spiders for a special directory with htaccess

Community
  • 1
  • 1
Velimir Tchatchevsky
  • 2,812
  • 1
  • 16
  • 21
  • +1, though there must be a library that does this already? Seems like a tangly thing to implement from scratch. If there isn't a library already, then @Andreas would you open source your solution? because there *should* be a library to do this ;) – alexanderbird Apr 11 '16 at 07:32
  • @alexanderbird I do not use any library or anything pre made on my pages, everything is done from scratch. My page is "only" an advanced weather page and I like doing stuff from scratch :-) – Andreas Apr 11 '16 at 10:02
  • @Vellmir thanks! Not sure I want to block them completly. Just make sure they do not get the question with a yes/no option. By some reason they all seem to click "yes" which means, since they dont have a cookie, redirect to the settings page. What do you mean by the honeypot? I will google it I promise, but what will the bots fill in this field with? Is !="" enough? – Andreas Apr 11 '16 at 10:06
  • And can this really be used together with the javascript alert() yes/no question, that I have forgot the name of. – Andreas Apr 11 '16 at 10:07
  • 1
    !="" will suffice anything they put in the hidden field would show you they are a bot (no human, normal visitor will see the field therefore can't fill it even by mistake). In your htaccess and robots.txt file you can block only certain pages in your site - like the settings one and bots won't follow the redirect. You can combine both, having your modal window redirect bots to another - more valuable for them page – Velimir Tchatchevsky Apr 11 '16 at 10:41
  • 1
    PS: a honeypot field is just a form field that is not visible (use css to hide not type:hidden - the later is easily understood by the bots). You leave the value of the field empty by default and on form submission check out if the field was empty or not. Bots will fill it in (they need to fill all the fields in order to not hit simple validation errors), users won't be able to – Velimir Tchatchevsky Apr 11 '16 at 10:45
  • @Velimir, thank you! It sounds like a good idea. I have had plans to switch from javascript confirm() message to a css popup window with the yes/no field and this seems to work nicely with that idea. Just to make sure I got this correctly. Hide a field javascript onform submit check if the field is not empty and pass to settingspage, if field is filled close popup and do nothing (or redirect to somewhere else) – Andreas Apr 11 '16 at 11:05
  • Yup, only remember not to make it `input type="hidden"` but rather use css `display:none;` or even better `margin-right:-100vw;` + `overflow:hidden;` on the parent. Also accept answer pls =] – Velimir Tchatchevsky Apr 11 '16 at 11:23
  • @velimir thanks! :-) I always accept and upvote when I get help. Thanks. – Andreas Apr 11 '16 at 11:52
1

It is quite easy to do this, even so, there are many options, depending on your specific needs.

Here is a simple solution:

  • on each page, make the first link styled to be "invisible" (opacity:0), which points to some place that either triggers some javascript, or points to some place you want for robots; also place it off-screen (top:-999px)

  • set a timeout (like 500ms) on page load to give a robot some time to "click" the link

  • after the timeout, it should be a human user -if the "trap" was not triggered

  • optionally you can also check for mouse activity, but the above should suffice

This should work well, because a "human user" cannot click the link, but a bot can because it reads the HTML. Beware not to: "display:none", else the bot may skip this.

1

I'd recommend using honeypots to detect them.

Here's an interesting Article about this.

pguetschow
  • 5,176
  • 6
  • 32
  • 45
  • I will read it if I have time, it was a hugh article. At least it looked hugh on my phone when I opened the link. Thanks! – Andreas Apr 11 '16 at 11:54