1

Recently I've seen articles stating that Google now crawls sites and renders CSS and Javascript. Example article by Google themselves: http://googlewebmastercentral.blogspot.co.uk/2014/05/understanding-web-pages-better.html

I have a single page application setup in Angular with HTML5 mode on the routing. An ng-view in my index.html is populated based on the URL like so:

app.config(function($routeProvider, $locationProvider){

  $locationProvider.html5Mode(true);

  $routeProvider.when("/", {
    templateUrl: "/views/dashboard.html"
  }).when("/portfolio", {
    templateUrl: "/views/portfolio.html"
  });

});

Google should now go to www.example.com/portfolio, execute the javascript which brings in the content from views/portfolio.html and be able to read all that content right?

That's what should happen according to those articles I've read. This one in particular explains it in detail regarding Angular: https://weluse.de/blog/angularjs-seo-finally-a-piece-of-cake.html

Here's the problem. When I use Google Webmaster Tools and the Fetch, or Fetch and Render functionality to see how Google sees my pages, it doesn't render the JS and just shows the initial HTML in my index.html.

Is it working? Have I done something wrong? How can I test it?

CaribouCode
  • 13,998
  • 28
  • 102
  • 174
  • Have you added the `.hashPrefix('!')` and `` pieces? They are necessary for the `_escaped_fragment` piece that bots require.=> https://developers.google.com/webmasters/ajax-crawling/docs/specification – PlantTheIdea Mar 02 '15 at 16:25
  • @PlantTheIdea I switched to HTML5 mode (without the hashPrefix) when I read the second article in my question. The blogger there explains how you can use HTML5 mode and Google will still execute the JS. Not sure if that's true or not. – CaribouCode Mar 02 '15 at 16:28
  • Also, surely it must be possible to have pretty URLs but still index on Google? Seems ludicrous not to. – CaribouCode Mar 02 '15 at 16:29
  • There are a number of ways to do it, but the most bulletproof way to do it is using that fragment. It is an identifier to Google (as well as other search bots) to apply the escape fragment to the site. I just recently set up a site using Angular and with the `` tag mentioned, had no problem rendering on Webmaster Tools. – PlantTheIdea Mar 02 '15 at 17:14
  • @PlantTheIdea So the only safe way to get indexed on Google from a MEAN stack SPA is to ruin your site URLs with `/#!/` in every one? That seems a little insane to me and would potentially be enough to put a lot of people off using this tech. – CaribouCode Mar 02 '15 at 18:10
  • No no no ... just because you also incorporate the `.hashPrefix()` doesnt't mean you're locked in. I'll give an answer to give more context. – PlantTheIdea Mar 02 '15 at 19:37

1 Answers1

2

So as I mentioned in the comments, hopefully this answer gives more context of what I mean.

So when you do your declaration of html5Mode, also include the hashPrefix:

$locationProvider
    .html5Mode(true)
    .hashPrefix('!');

Then, in your <head>, include this tag:

<meta name="fragment" content="!">

What happens here is that you are providing a fallback measure for the History API, meaning for all users visiting with compliant browsers (basically everything nowadays) they will see this:

http://example.com/home/

And only on dinosaur browsers like IE9 would they see this:

http://example.com/#!/home/

Now, that is in real life with actual people as visitors. You asked specifically about being indexed by Google, which uses bots. They will try to go to example.com/home/ as an actual destination on your server (meaning /home/index.html), which obviously doesn't exist. By providing the <meta> tag above, you have provided the hint to the bot to instead go to an ?_escaped_fragment version of the site (like index.html?_escaped_fragment=home) and associated it with that URL of /home/ in the actual Google searches.

It is entirely on the backend, all visitors to your site will still see the clean URL, and is only necessary because under the hood Angular uses location.hash, which is not seen on server-side. Bottom line, your actual users will be unaffected and not have the ugly URL, unless they're on a browser that does not support the History API. For those users, all you've done is make the site start working for them (as before it would have been broken).

Hope this helps!

UPDATE

Since you are using a MEAN stack, you can also go a different direction which has been around a long time, which is to use HTML snapshots. There are npms that will provide snapshots (meaning static HTML from post-render) that can be served up from your server at the locations shown. That technique is a little outdated, but its been around since like 2012 and is proven to work.

Back when I did it, i used grunt-html-snapshot, but there are others out there. You can even use PhantomJS to make the snapshots, although I never did it.

PlantTheIdea
  • 16,061
  • 5
  • 35
  • 40
  • Thanks for the response. This seem logical and I haven't yet tried HTML5 mode and hashPrefix together. I'll try this first thing in the morning then respond. – CaribouCode Mar 03 '15 at 00:25
  • So just wondering, in my MEAN stack situation, will I have to do HTML snapshots? Will I also need to route `?_escaped_fragment` requests to the correct locations in my Node backend? I was trying to avoid doing snapshots on the assumption that Google executes the javascript and therefore can access the content. Is that not so? – CaribouCode Mar 03 '15 at 10:45
  • Once I did the `?_escaped_fragment` method I no longer had to do HTML snapshots, it should be one or the other, not both. – PlantTheIdea Mar 03 '15 at 11:59
  • Very interesting indeed. So just to clarify, if I use HTML5 mode and hashPrefix together, then add the meta fragment tag, Google will understand it needs to render the javascript to get the content? And the user will go to `www.example.com/portfolio` but Google will go to `www.example.com/portfolio?_escaped_fragment`? – CaribouCode Mar 03 '15 at 12:04
  • Many articles (including Google themselves) are still saying you need to do HTML snapshots for any ajax content. Comments on this Stackoverflow answer also explain that other search engines definitely do not render JS: http://stackoverflow.com/questions/13499040/how-do-search-engines-deal-with-angularjs-applications. If this is the case, and I need to do HTML snapshots, with HTML5 mode, how would the ?_escaped_fragment URL look? Would I need to detect that and redirect it to the snapshots? – CaribouCode Mar 03 '15 at 13:53
  • First off, you asked about Google specifically, so I provided the answer for that ... other search engines are a different ballgame. Secondly, if you want a bulletproof, streamlined solution for your MEAN stack using HTML snapshots, I recommend Prerender.io (https://prerender.io/). If you have a site that has over 250 pages its paid, but it is a really solid end-to-end solution. – PlantTheIdea Mar 03 '15 at 13:59
  • OK. I'm just wondering if I use the method you suggest for html5 mode and hashPrefix together, what URL would Google go to (so that I can redirect it to the snapshot). If the URL was `www.example.com/portfolio` would it be `www.example.com/?_escaped_fragment=/portfolio` or would it be `www.example.com/portfolio/?_escaped_fragment=`. I know from using hashPrefix in the past that it changed the /#!/ part of the URL to `?_escaped_fragment` – CaribouCode Mar 03 '15 at 14:14
  • I'm fairly certain it goes to the former, although there is a trailing underscore ... `example.com/?_escaped_fragment_=/portfolio`. Something like that – PlantTheIdea Mar 03 '15 at 14:40
  • As this answer is 5 years old - is there current guidance on best practice to follow now in 2020? My angularjs site has no angularjs content in the google cached version of my page and i would like to fix it - preferable as easy as possible without having to create a snapshot of each page. – john blair Apr 11 '20 at 13:01