15

I have added the spiderable package package to my Meteor app, and the html version of the page is returned when making requests with ?_escaped_fragment_= in the url, but I'm unable to get Google to crawl the site.

Details

When using Fetch as Google in Google Webmaster Tools and requesting the root page "http://example.com/" the page return is the javascript version; some thing like:

HTTP/1.1 200 OK
content-type: text/html; charset=utf-8
date: Fri, 30 Nov 2012 05:39:36 GMT
connection: Keep-alive
transfer-encoding: chunked

<!DOCTYPE html>
<html>
  <head>
    <link rel="stylesheet" href="/e83157bdc4ff057fa3a20b82af4c11b4ebe776e7.css">
    <script type="text/javascript">
      __meteor_runtime_config__ = {"ROOT_URL":"http://www.example.com","DEFAULT_DDP_ENDPOINT":"https://www-example-com-ddp.meteor.com/"};
    </script>
    <script type="text/javascript" src="/13cf3d21ce1c4a88407ca5f3c250f186ab1738f9.js"></script>
    <meta name="fragment" content="!">
    <title>example.com</title>
  </head>
<body>
</body>
</html>

If instead, I request http://example.com/?_escaped_fragment_= the html version is returned:

HTTP/1.1 200 OK
content-type: text/html; charset=UTF-8
date: Wed, 05 Dec 2012 02:44:09 GMT
connection: Keep-alive
transfer-encoding: chunked

<!DOCTYPE html>
<html>
  <head>
    <link rel="stylesheet" href="/e83157bdc4ff057fa3a20b82af4c11b4ebe776e7.css">
    <title>example.com</title>
    <meta name="viewport" content="initial-scale=1.0">
  </head>
  <body>
    <ul>
      <li><a href="/">Home</a></li>
      <li><a href="/one">One</a></li>
      <li><a href="/two">Two</a></li>
    </ul>
  </body>
</html>

Questions

  • How do you tell Google to add the ?_escaped_fragment_= to the url, so that it renders the html version?

  • Will Google still add the ?_escaped_fragment_= to the url, if the urls do not have hashbangs (!#)? i.e. /home, /products/1 instead of /!#home, /!#products/1?

  • How do you make Google follow the linked pages? And append the ?_escaped_fragment_=? All of the js version of the page have <meta name="fragment" content="!"> in the header. I assumed that was all that was required.

It seems that the simplest solution would be update the update the spiderable package to return the html version to Google Bot, instead of requiring ?_escaped_fragment_=, but if this is working for others, I'm curious, as to what I'm doing wrong.

Additional Info

Meteor's spiderable package is a temporary solution to allow web search engines to index Meteor applications.

According to the source it does a few things:

  1. It adds the following tag to the head section of js version of the page:

    <head><meta name="fragment" content="!"></head>

  2. Using PhantomJS it parses the javascript application and returns an html version when either of the following conditions are met:

    a. The requesting user agent is "facebookexternalhit"

    b. The requested url contains the string ?_escaped_fragment_=

Braiam
  • 1
  • 11
  • 47
  • 78
Kyle Finley
  • 11,842
  • 6
  • 43
  • 64

3 Answers3

6

I believe this to be a "Google Webmaster Tools" bug.

It seems that Google is indeed crawling the site -- the pages are showing up in Google results. Yet, Google Webmaster tools still list total indexed pages as 1. Bing still isn't crawling the page, however.

EDIT: It Google Webmaster Tools the pages are listed as

Not selected: Pages that are not indexed because they are substantially similar to other pages, or that have been redirected to another URL. More information.

EDIT2: In response to Jonatan's question:

Will Google still add the ?_escaped_fragment_= to the url, if the urls do not have hashbangs (!#)?

Yes. My application does not use hashbangs (!#) in the urls. And Google bot still appends ?_escaped_fragment_= when crawling. Here's an example of the logs:

INFO HIT /url/2/01 66.249.72.42
INFO HIT /url/2/01?_escaped_fragment_= 66.249.72.142
INFO HIT /url/2/01 108.162.222.82
INFO HIT /url/2/01?_escaped_fragment_= 108.162.222.82
INFO HIT /url/2/05 108.162.222.82
INFO HIT /url/2/05?_escaped_fragment_= 108.162.222.214

It appear that Google bot will try the url with and without the ?_escaped_fragment_=

Kyle Finley
  • 11,842
  • 6
  • 43
  • 64
  • Your question contained more questions than this answer answers. Why was it marked as correct? Can you add the other answers? In particular, I'm looking for the answer to `Will Google still add the ?_escaped_fragment_= to the url, if the urls do not have hashbangs (!#)?`. – Jonatan Littke Jun 02 '13 at 19:00
  • @JonatanLittke I've update the answer. Anecdotally, Google bot will first try the url without the `?_escaped_fragment_` and then with it. – Kyle Finley Jun 02 '13 at 20:40
  • Thanks! This is probably because of the fragment meta tag, I'd assume. I don't think they do this on every other site, surely. – Jonatan Littke Jun 03 '13 at 08:06
  • So why do you need the `?_escaped_fragment_=`? without it, the html isnt rendered in the response from `curl` – Chet Jun 10 '14 at 21:57
  • @Chet, the spiderable package only sends the rendered HTML when `?_escaped_fragment_=` is present in the url. Rendering the HTML is a slow process, so it doesn't do it for every request only those that can't render the javascript on the client. Does that answer your question? – Kyle Finley Jun 11 '14 at 02:48
2

Any page which does not have a hash fragment starting with #!, such as the home, needs this:

 <meta name="fragment" content="!">

To notify the crawler to fetch the ugly url (the one with _escaped_fragment_=). Obviously it goes into the <head> section.

UPDATE: I noticed that according to the description of the plugin given at the end of your question the above meta tag is added, you may check that it's included in your page by displaying the source code.

Normally all other pages except the home should have something like www.yoursite.com/#!hashfragment in the pretty URL, where the ! after the hash(#) works as a notifier for the crawler, so that you don't need to include the meta tag mentioned above.

mastazi
  • 1,623
  • 26
  • 41
  • more info here: https://developers.google.com/webmasters/ajax-crawling/docs/specification – mastazi Dec 07 '12 at 09:41
  • thank you for the link. Everything you've mentioned is correct. I now believe Google is crawling the site, but that the result just aren't showing up in the webmaster tools. – Kyle Finley Dec 07 '12 at 16:15
2

I know this question has been answered but for people coming to this question from google. I wanted to include this screencast on this topic.

This helped me understand meteor spiderable package. https://www.eventedmind.com/tracks/feed-archive/meteor-the-spiderable-package

cmather
  • 1,950
  • 14
  • 17
ericraio
  • 1,469
  • 14
  • 35