15

There are numerous resources out there for implementing SEO-friendly versions of AngularJS applications, of course. Despite reading all of them numerous times, I'm still a bit unclear on a few things, particularly regarding the distinction between the hashbang and HTML5 mode models:

  1. For hashbang (#!, or "HTML4") apps, the following setting is given on the location provider:

    $location.hashPrefix('!');
    

    Is this setting required for HTML5 mode as well? Why or why not?

  2. For HTML5 mode apps, the following meta tag is included in the index.html page:

    <meta name="fragment" content="!">
    

    Is this meta tag required for hashbang apps as well? Why or why not?

  3. Using HTML5 mode, my URLs look similar to:

    http://sample.com/landing/home
    

    Even with the meta tag from #2 specified in my index.html, I'm still unable to navigate to my URLs as a crawler would, such as to:

    http://sample.com/#!/landing/home
    

    Is this normal? Should I expect to be able to navigate to my app hashbang-style, if it's an HTML5 mode app, after adding the location provider settings and/or meta tag?


More than anything, I guess my actual question would be: what's specifically required for HTML5 mode crawling, and what's specifically required for hashbang-style crawling? How do they overlap? Additionally, how does the HTML5 mode configuration actually work, behind the scenes, if no hashbang-style route is ever produced/usable?

Note that these questions are separate from the issue of generating/serving snapshots, which I generally understand.

AngularJS SEO-friendly configuration generally makes sense when it comes to classical hashbang-style apps, but for HTML5 mode, I'm a bit confused. Would love some clarity.

J. Ky Marsh
  • 2,465
  • 3
  • 26
  • 32

1 Answers1

13

Answers

  1. Hashbang isn't required for HTML4 either. But if you want to implement SEO it's good that you do use it as search bots will see those links and request a different URL:

    original

    http://somesite.com/#!/crazy/101
    

    bot:

    http://somesite.com/?_escaped_fragment_=crazy/101
    
  2. Meta tag is included so search bot will automatically append ?_escaped_fragment_ to requests. Since it can't know which part is actually part of SPA the value will be empty.

    original with meta tag

    http://somesite/crazy/101
    

    bot

    http://somesite/crazy/101?_escaped_fragment_=
    
  3. See #2

How HTML5 mode works behind the scenes?

It works using History API implemented in HTML5 that allows changng browser's URL and history entry manipulation. Basically it allows developers to change browser's URL address without the browser to actually make a request.

Additional HTML5 mode explanation

Suppose your SPA runs at domain root http://somesite.com. So whenever URL in browser changes it means that it's been manipulated on the client. This means that there is no actual content on the server at some sub-content URL.

That's why bot appends _escaped_fragment_ at the end so you can serve static content instead of 404 or 301 to root (as content doesn't exists on the server). This static content does nothing else but returns content. No processing no SPA scripts. Pure content.

Robert Koritnik
  • 103,639
  • 52
  • 277
  • 404
  • 2
    Now I'm even MORE confused, then: as far as #2, why would it append the escaped fragment to the end? That makes it impossible to know which page the crawler is requesting, in order to parse the request and serve the relevant snapshot... With the hashbang model, you always know which route/page is being requested, so you can intercept the request and serve the relevant snapshot. – J. Ky Marsh May 01 '14 at 21:31
  • 1
    @J.KyMarsh: Impossible to know? Why? It actually makes a request for the correct URL. Let me remind you that when URL on the client side shows `http://somesite.com/crazy/101` there was no direct request to the server. This URL was generated on the client side. Your SPA runs on `http://somesite.com` and all sub-content is purely on the client side. So whenever bot requests for something sub- you always know what it's requesting for. And if there was no escaped fragment at the end you should likely just redirect to `http://somesite.com` to server your application. – Robert Koritnik May 01 '14 at 21:33
  • Does that mean that, with HTML5 mode and the meta tag, serving snapshots isn't even necessary? Is the meta tag even necessary? Sorry for the slew of questions; there seem to be numerous misconceptions about how SEO works for AJAX-based apps, and things appear to have changed drastically over the last few years, so I'm really trying to figure out what the correct, modern best practice is. – J. Ky Marsh May 01 '14 at 21:36
  • Serving snapshots is necessary as bots don't execute javascript. META tag is necessary exactly for that reason. Because in an SPA some/all of the content is dynamically generated using Javascript + Ajax. Maybe you should read a [Getting started guide on Google Webmasters](https://developers.google.com/webmasters/ajax-crawling/docs/getting-started) – Robert Koritnik May 01 '14 at 21:45
  • Have read that entire guide numerous times. Their portion on handling HTML5 is extremely brief. So to summarize what one might have to do with the HTML5 mode technique: 1.) add meta tag to index.html, 2.) on receiving request for a page, inspect the request to see if there's an '_escaped_fragment_' at the END, rather than before the requested fragment, as it would be for hashbang-style, 3.) if the fragment exists, send back the snapshot? So the difference lies in being able to GRAB the escaped fragment, for hashbang-style, and having to just look for it on the very end, for HTML5 mode? – J. Ky Marsh May 01 '14 at 21:50
  • 2
    I'm not sure what you mean by *look for it*. Query variable is there. Whether with value or not. If you're using hashbang it holds the escaped fragment value, but in HTML5 mode it's empty and URL itself gives you all the information of the requested resource instead. **And btw**. Add META tag to static snapshots as well in case they have links in them, so bots will generate correct requests for those as well. – Robert Koritnik May 01 '14 at 22:00
  • 1
    Right; that's what I mean. It's the difference between the escaped fragment containing the value (in hashbang mode), which you'd use to determine what page is being requested, and the escaped fragment being empty (in HTML5 mode), in which case you'd just use the requested URL as a whole. But you've otherwise answered my question(s): snapshots ARE still necessary, and you DO need the meta tag in HTML5 mode, it's just the manner in which parsing the URL differs between the two approaches due to how the escaped fragment is utilized. Thank you! – J. Ky Marsh May 01 '14 at 22:04