0

I've got this setup:

  • Single page app that generates HTML content using Javascript. There is no visible HTML for non-JS users.
  • History.js (pushState) for handling URLS without hashbangs. So, the app on "domain.com" can load dynamic content of "page-id" and updates the URL to "domain.com/page-id". Also, direct URLS work nicely via Javascript this way.

The problem is that Google cannot execute Javascript this way. So essentially, as far as Google knows, there is no content whatsoever.

I was thinking of serving cached content to search bots only. So, when a search bot hits "domain.com/page-id", it loads cached content, but if a user loads the same page, it sees normal (Javascript injected) content.

A proposed solution for this is using hashbangs, so Google can automatically convert those URLs to alternative URLs with an "escaped_fragment" string. On the server side, I could then redirect those alternative URLs to cached content. As I won't use hashbangs, this doesn't work.

Theoretically I have everything in place. I can generate a sitemap.xml and I can generate cached HTML content, but one piece of the puzzle is missing.

My question, I guess, is this: how can I filter out search bot access, so I can serve those bots the cached pages, while serving my users the normal JS enabled app?

One idea was parsing the "HTTP_USER_AGENT" string in .htaccess for any bots, but is this even possible and not considered cloaking? Are there other, smarter ways?

Monokai
  • 1,163
  • 2
  • 10
  • 16
  • Google specifically forbid serving different content to them than for users: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=66355 Their advice is to place content in a – Purpletoucan Dec 11 '12 at 11:37
  • I think [these SO answers](http://stackoverflow.com/q/7549306/344480) give you all the information you need. – Matthias Dec 11 '12 at 11:39

1 Answers1

0

updates the URL to "domain.com/page-id". Also, direct URLS work nicely via Javascript this way.

That's your problem. The direct URLs aren't supposed to work via JavaScript. The server is supposed to generate the content.

Once whatever page the client has requested is loaded, JavaScript can take over. If JavaScript isn't available (e.g. because it is a search engine bot) then you should have regular links / forms that will continue to work (if JS is available, then you would bind to click/submit events and override the default behaviour).

A proposed solution for this is using hashbangs

Hashbangs are an awful solution. pushState is fix for hashbangs, and you are using that already - you just need to use it properly.

how can I filter out search bot access

You don't need to. Use progressive enhancement / unobtrusive JavaScript instead.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • The progressive enhancement idea is valid in case of server side rendering, but I'd like to use a Javascript templating system. So the complete rendering flow is happening on the client side. What I CAN do however is render HTML snapshots of the rendering and serve those to search bots. This is also something Google recommends (using hashbangs however), but then my question still remains. Is there really no solution for automatically serving search bots the snapshots and serving all other users the JS version? – Monokai Dec 14 '12 at 10:34
  • You can't reliably detect a search bot, and there's rarely any reason to exclude users who happen to have JS disabled. Progressive enhancement doesn't stop you using a JavaScript template system on the client, it doesn't even stop you using one on the server - e.g. [mojito](http://developer.yahoo.com/cocktails/mojito/) uses the same templates server side and client side. – Quentin Dec 14 '12 at 10:38