7

I'm reading about google specifications about ajax crawling; I understood the concept but i need some more clarifications:

my URLs are all like this:

http://www.website.com/#!/eng/home
http://www.website.com/#!/eng/contacts
...

I have to provide the html snapshot at these addresses:

http://www.website.com/?_escaped_fragment_=/eng/home
http://www.website.com/?_escaped_fragment_=/eng/contacts
...

Is this correct? Or should I remove the "/" in the "escaped_fragment" URL (ex. http://www.website.com/?_escaped_fragment_=eng/home or something else?)

I generate the HTML snapshots with phantomjs, but then which one is the best way to provide these snapshots to the crawler? Using node js? Using htaccess rewrite rules?

Kara
  • 6,115
  • 16
  • 50
  • 57
Cereal Killer
  • 3,387
  • 10
  • 48
  • 80

2 Answers2

5

Ok, since i finally got rid of this, i would like to share the way i found;

first of all the HTML snapshot must be provided to the crawler at a specific URL where

?_escaped_fragment_=

is replacing

#!

So if you have:

http://www.website.com/#!/eng/home

your server must provide the snapshot at:

http://www.website.com/?_escaped_fragment_=/eng/home

If someone is interested in the method i use to generate the snapshot, i simply use a node module called judo (https://npmjs.org/package/judo); in order to use this you need to have on your server phantomjs (http://phantomjs.org/) and node (http://nodejs.org/); (more information about how to install phantomjs on the server: How can I setup & run PhantomJS on Ubuntu?)

Once you have everything installed you just need to write a js file using judo (ex. judo.js) (following the doc page that i've linked before you will be ready in 5 mins); upload the file on the server and execute it with node to create the snapshots and the sitemap;

after this, you need to serve the google's crawler with the HTML snapshots when he ask for ?_escaped_fragment_= URLs; the simplest way in my opinion is by .htaccess file; in particular you need just 3 lines of code, that in my case are:

RewriteEngine On
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/(.*)$
RewriteRule ^$ /seo/snapshots/%1\.html [L]

(since in my judo.js file creates the snapshots in /seo/snapshots directory)

Finally, you can check that everything works using the "fetch as google" option in the google webmaster tools' panel; if you did all correctly, you will see that the result is the HTML snapshot...

Community
  • 1
  • 1
Cereal Killer
  • 3,387
  • 10
  • 48
  • 80
  • An additional thing to consider is that the _escaped_fragment_ will be urlencoded and therefore needs to be decoded to create the 'pretty url' for phantomjs to render. I recently launched crawlspa.com which provides everything as a service. – DanS Nov 16 '13 at 10:32
1

Usually i don't answer SO posts by suggesting a paid service, but in this case think you should really consider using BromBone - http://www.emberjsseo.com

Mike Grassotti
  • 19,040
  • 3
  • 59
  • 57
  • Thanks for your suggestion, but i'm interested in learning this; starting from scratch, step by step i understood how to create the snapshots, and now i would like to get rid of this final step... – Cereal Killer Oct 30 '13 at 23:03