3

Are search-bot or spam-bots able to emulate/trigger JavaScript events while they read out the page?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
  • Under what conditions? A search bot might download your page to sniff it using expressions; if it encounters forms it might blat them with dat to see what happens; if it encounter captcha it might OCR it and see if it can outsmart it. Automator software can move the mouse and trigger clicks, that's always been the case. That's how bots work on MMORPG's and also on browsers. What is your use case - are you tightening your app? – frumbert Mar 30 '12 at 05:14
  • Well you can see on my website: www.omerbase.com - the Anti-Spam thing. I was wondering if a spam/search bot could figure out that a DIV is clickable and means that will effect on the form submittion... as you sad, he can seekout the javascript with expressions and find a relation element within the HTML, .. so it might cause a click/hover/etc event... did I get this right? –  Mar 30 '12 at 05:16
  • @ZlatanOmerović If your application gets popular enough, it's only a matter of time before somebody clicks "View Source" and figures out what the div does on click, and writes a bot to do that continuously. So whether or not search and spam bots might trigger an action, you might as well suppose that *somebody* will write a bot to trigger that action. – Adam Mihalcin Mar 30 '12 at 05:18
  • @AdamMihalcin I know that. But visit my page and please see my implementation of what I've done. There are intervals involved, and they check any changes on that button, etc... I'm rightnow figuring ways to prove that the visitor is a real human being... –  Mar 30 '12 at 05:20

2 Answers2

1

No, because search bots fetch a static HTML stream. They aren't running any of the initialization events like init() or myObj.init(), which is in your JavaScript code. They don't load any external libraries like jQuery, nor execute the $(document).ready code nor any of the standard .click() listeners. So unless a search bot author has a specific reason to intentionally build their search bot to trigger or execute <script> blocks which are on the page, they usually won't run JavaScript code.

I've written a search bot. All that I care about is extracting the links & text from the page. However, I don't want to run someone else's client-side calendar component nor video player component. I don't want that JS code to be inserted into my database, where it could end up on the Search Engine Results Page (SERP). So there is no reason to run an eval() command on any code in the <script> blocks, nor trigger any of the initialization events in the JS layer.

When search bots load the HTML DOM, there are usually embedded external .js files in them. So to execute the JS would require parsing out the strings for multiple .js files, then building a concatenator for those files & then trying to execute everything that's been downloaded. That's extra work for a search bot author, for no net gain at all. We simply don't want that JS code to appear anywhere in our SERPs. Otherwise, seeing JS code on the SERP looks like a bad search result. However, bots can see content in <script> tags & are only looking for links to crawl. So that may be why people start to think that bots can execute JavaScript, but they are only really parsing them for their text links.

Clomp
  • 3,168
  • 2
  • 23
  • 36
0

Here’s someone who makes the case that Google is loading pages in a headless WebKit when crawling them to get a chance to index AJAX content and for other reasons. Search bots don’t generally submit forms though.

I’ve taken a look at your site and the protection is entirely client-side. Since an HTML form really is just a description of what key/values to submit to some URL, there’s no reason anyone couldn’t just POST this data with a bot.

Example:

POST /contact
/* ... */

fullname=SO+test&email=test%40example.com&reason=test&message=test

Also, and this is important, you are penalising legitimate visitors this way. There’s all kind of reasons why JavaScript could be blocked, fail to load, or simply not work.

Sijmen Mulder
  • 5,767
  • 3
  • 22
  • 33
  • Yes. You are right, except one thing. One thing doesn't seem to appear in the form as it should be when it just loads. This thing appears when the data in the form is full-filled and when the button AntiSpam is clicked. Checkout yourself, and inspect every input item in the form and you'll see how I currently attend to recognize humans from bots. :) –  Mar 30 '12 at 06:36
  • Actually, the script failed for me which is one reason I added the warning. I clicked the anti-spam button, filled out the form, but the submit button was never enabled. Feel free to add some human-check but please don’t do it in JavaScript. – Sijmen Mulder Mar 30 '12 at 07:12
  • lengths are specified in the descriptions of the fields... valid mail, lengths, etc... only if you had override the script with JS injection via Firebug or something :) –  Mar 30 '12 at 07:21
  • I see no reason to support users with JS disabled. The benefits of no spam outweigh the miniscule market that chooses to disable it. I am also looking into whether spambots trigger JS events, and so far it doesn't appear that anyone knows really. I guess I have to just test it then... – 3Dom Nov 16 '13 at 05:26