Single Page Application SEO and infinite scroll AngularJS

Question

We are have a site with a feed similar to pinterest and are planning to refactor the jquery soup into something more structured. The two most likely candidates are AngularJS and Backbone+Marionette. The site is user-generated and is mostly consumption-oriented (typical 90/9/1 rule) with the ability for users to like, bookmark, and comment on posts. From the feed we open a lightbox to see more detail about the post with comments, related posts, similar to pinterest.

We have used backbone sporadically and are familiar with the idea but put off by the boilerplate. I assume Marionette would help a lot with that but we're open to changing the direction more radically (eg Angular) if it will help in the long term.

The requirements:

Initial page must static for SEO reasons. It's important that the framework be able to start with existing content, preferable with little fight.
we would prefer to have the data needed for the lightbox loaded already in feed so that the transition can be faster. Some of the data is already there (title, description, photos, num likes/ num bookmarks,num comments) but there is additional data that would be loaded for the detail view - comments, similar posts, who likes this, etc.
Changes to the post that happen in the feed or detail lightbox should be reflected in the other with little work (eg, if I like it from the feed, I should see that like and new like count number if I go to the lightbox - or the opposite.)
We would like to migrate our mobile site (currently in Sencha Touch) to also use the same code base for the parts that are common so we can have closer feature parity between mobile and main site.

These requirements related to my concerns about Angular:

1) Will it be possible/problematic to have initial page loads be static while rending via the templates additional pages.

2) is it problematic to have multiple data-sources for different parts of page - eg the main post part comes from embedded json data and from "see more"s in the feed while the additional detail would come from a different ajax call.

3) While the two-way binding is cool - I'm concerned it might be a negative in our case because of the number of items being rendered. The number of elements that we need two-way binding is relatively small. Posts like:

concern me for our use-case. We can easily have hundreds of posts each with 1-2 dozen details. Can the two-way binding be "disabled" where I have fields/elements that I know won't change?

Is it normal/possible to unload elements outside of the view port to same memory? This is also connected to the mobile direction because memory is even more of a concern there.

Would AngularJS work/perform well in our use-case? Are there any tricks/tips that would help here?

Dan Kanze · Accepted Answer · 2013-06-29T04:41:04.873

There are different methods of "infinite scroll" or feed as you put it. The needs of the users and size of acceptable response payload will determine which one you choose.

You sacrifice usability where you meet performance it seems here.

1. Append assets

This method is your traditional append to bottom approach where if the user reaches the bottom of the current scroll height, another API call will be made to "stack on more" content. This has it's benefits as being the most effective solution to handle cross device caveats.

Disadvantages of this solution, as you have mentioned, come from large payloads flooding memory as user carelessly scrolls through content. There is no throttle.

<div infinite-scroll='getMore()' infinite-scroll-distance='0'>
  <ul>
    <li ng-repeate="item in items">
      {{item}}
    </li>
  </ul>
</div>

var page = 1;
$scope.getMore() = function(){ 
 $scope.items.push(API.returnData(i));
 page++;
}

2. Append assets with a throttle

Here, we are suggesting that the user can continue to display more results in a feed that will infinitely append, but they must be throttle or "manually" invoke the call for more data. This becomes cumbersome relative to the size of the content being returned that the user will scroll through.

If there is a lot of content being retruned per payload, the user will have to click the "get more" button less. This is of course at a tradeoff of returning a larger payload.

<div>
  <ul>
    <li ng-repeate="item in items">
      {{item}}
    </li>
  </ul>
</div>
<div ng-click='getMore()'>
  Get More!
</div>

var page = 1;
$scope.getMore() = function(){
  $scope.items.push(API.returnData(i));
  page++;
}

3. Virtual Scroll

This is the last and most interesting way to infinite scroll. The idea is that you are only storing the rendered version of a range of results in browser memory. That is, complicated DOM manipulation is only acting on the current range specified in your configuration. This however has it's own pitfalls.

The biggest is cross device compatibility .

If your handheld device has a virtual scrolling window that reaches the width of the device --- it better be less then the total height of the page because you will never be able to scroll past this "feed" with its own scroll bar. You will be "stuck" mid page because your scroll will always be acting on the virtual scroll feed rather than the actual page containing the feed.

Next is reliability. If a user drags the scroll bar manually from a low index to one that is extremely high, you are forcing the broswer to run these directives very very quickly, which in testing, has caused my browser to crash. This could be fixed by hiding the scroll bar, but of course a user could invoke the same senario by scrolling very very quickly.

Here is the demo

The source

"Initial page must static for SEO reasons. It's important that the framework be able to start with existing content, preferable with little fight."

So what you are saying is that you want the page to be prerendered server side before it serves content? This approach worked well in the early thousands but most everyone is moving away from this and going towards the single page app style. There are good reasons:

The inital seed you send to the user acts as a bootstrap to fetch API data so your servers do WAY less work.
Lazy loading assets and asynchronous web service calls makes the percieved load time much faster than the traditional "render everything on the server first then spit it back out to the user approach."
Your SEO can be preserved by using a page pre-render / caching engine to sit in front of your web server to only respond to web crawlers with your "fully rendered version". This concept is explained well here.

we would prefer to have the data needed for the lightbox loaded already in feed so that the transition can be faster. Some of the data is already there (title, description, photos, num likes/ num bookmarks,num comments) but there is additional data that would be loaded for the detail view - comments, similar posts, who likes this, etc.

If your inital payload for feed does not contain children data points for each "feed id" and need to use an additional API request to load them in your lightbox --- you are doing it right. That's totally a legit usecase. You would be arguing 50-100ms for a single API call which is unpercievable latency to your end user. If you abosultely need to send the additional payload with your feed, you arent winning much.

Changes to the post that happen in the feed or detail lightbox should be reflected in the other with little work (eg, if I like it from the feed, I should see that like and new like count number if I go to the lightbox - or the opposite.)

You are mixing technologies here --- The like button is an API call to facebook. Whether those changes propogate to other instantiations of the facebook like button on the same page is up to how facebook handles it, I'm sure a quick google would help you out.

Data specific to YOUR website however --- there are a couple different use cases:

Say I change the title in my lightbox and also want the change to propogate to the feed its currently being displayed in. If your "save edit action" POST's to the server, the success callback could trigger updating the new value with a websocket. This change would propogate to not just your screen, but everyone elses screen.
You could also be talking about two-way data binding (AngularJS is great at this). With two way data-binding, your "model" or the data you get back from your webservice can be binded to muiltiple places in your view. This way, as you edit one part of the page that is sharing the same model, the other will update in real time along side it. This happens before any HTTP request so is a completely different use case.

We would like to migrate our mobile site (currently in Sencha Touch) to also use the same code base for the parts that are common so we can have closer feature parity between mobile and main site.

You should really take a look a modern responsive CSS frameworks like Bootstrap and Foundation. The point of using responsive web design is that you only have to build the site once to accomadate all the different screen sizes.

If you are talking about feature modularity, AngularJS takes the cake. The idea is that you can export your website components into modules that can be used for another project. This can include views as well. And if you built the views with a responsive framework, guess what --- you can use it anywhere now.

1) Will it be possible/problematic to have initial page loads be static while rending via the templates additional pages.

As discussed above, its really best to move away from these kind of approaches. If you absolutely need it, templating engines dont care about wether your payload was rendered serverside or client side. Links to partial pages will be just as accesible.

2) is it problematic to have multiple data-sources for different parts of page - eg the main post part comes from embedded json data and from "see more"s in the feed while the additional detail would come from a different ajax call.

Again, this is exactly what the industry is moving into. You will be saving in "percieved" and "actual" load time using an inital static bootstrap that fetches all of your external API data --- This will also make your development cycle much faster because you are separating concerns of completely independant peices. Your API shouldnt care about your view and your view shouldnt care about your API. The idea is that both your API and your front end code can become modular / reusable when you break them into smaller peices.

3) While the two-way binding is cool - I'm concerned it might be a negative in our case because of the number of items being rendered. The number of elements that we need two-way binding is relatively small.

I'm also going to combine this question with the comment you left below:

Thanks for the answer! Can you clarify - it seems that 1) and 2) just deal with how you would implement infinite scrolling, not the performance issues that might come from such an implementation. It seems that 3 addresses the problem in a way similar to recent versions of Sencha Touch, which could be a good solution

The performance issues you will run into are totally subjective. I tried to outline the performance considerations like throttling into the discussion because throttling can drastically reduce the amount of stress your server is taking and the work your users browser has to do with each new result set appended into the DOM.

Infinite scroll, after a while, will eat up your users browser memory. That much I can tell you is inevitible but only through testing will you be able to tell how much. In my experience I could tell you that a users browser can handle a great deal of abuse but again, how big your payload is for each result set and what directives you are running on all of your results are totally subjective. There are solutions that render only on a ranged data set in option three I described, but have their limitations as well.

API data coming back shouldn't be anymore than 1-2kbs in size, and should only take about 50-200ms to return a query. If you arent meeting those speeds, mabye it's time to re-evaluate your queries or cut down on the size of the result set coming back by using child ID's to query other endpoints for specifics.

Thanks for the answer! Can you clarify - it seems that 1) and 2) just deal with how you would implement infinite scrolling, not the performance issues that might come from such an implementation. It seems that 3 addresses the problem in a way similar to recent versions of Sencha Touch, which could be a good solution. — Yehosef, Jun 27 '13 at 09:14
also - this answer addresses on the performance concerns. Do you know about the other concerns mentions in the question? (1. static initial page and 2. multiple data sources) — Yehosef, Jun 27 '13 at 09:15
@Yehosef I've expanded my answers to include your other questions. Let me know if you have anymore questions. — Dan Kanze, Jun 27 '13 at 23:13
About the SEO side - I read through the article and while it explains an approach - it's not clear if it's actually ok/helpful with Google. I've been under the impressions from our SEO gurus that serving different content to bots vs regular users is a no-no. Another important consideration is that it's not really important to our users that we be a SPA. And just because a site could be - I'm not sure that it should be. Eg - should yelp.com or instructables.com be written to be a SPA? We basically have a working site/flow, we just want a way to make order to the Javascript soup. — Yehosef, Jun 30 '13 at 06:27
@Yehosef The approach works for google! :) Look at this as well: http://backbonetutorials.com/seo-for-single-page-apps/ Your SEO gurus need not worry --- youre sending a rendered "snapshot" of the content that bots crawl through. The content is exactly the same. — Dan Kanze, Jun 30 '13 at 06:52
@Yehosef I would think of SPA as solid design pattern --- you dont lose anything but you gain a lot. To reiterate: Using SPA `pushState` is a throttle on all of your requests because if you `GET` the next page that has duplicate data (HTML/CSS/JS/JSON), you arent being DRY. Your user experience is way better. Its faster in both percieved and actual time. Your code becomes modular / reusable. Asynchronous lazy loading assets / API data. — Dan Kanze, Jun 30 '13 at 06:54
@Yehosef Sites that didn't go SPA style most likely didnt have the engineering capabilities at the time or the technologies were still too young. These days there isnt really any excuse "not" to use this design pattern because you dont have to sacrifice anything --- There are also really wonderful frameworks out there now that make building in this style cake. I would highly recommend AngularJS. — Dan Kanze, Jun 30 '13 at 07:05
thanks. I appreciate the pattern where it's a natural fit, and I understand there can be performance gains when using it, but I see risks when it's not needed (we've used pushstate and it wasn't always a piece of cake - maybe they take care of all the details. Lets say we're not set on the SPA approach (our SEO gurus disagree with yours) - how much difficulty will we have with Angular? — Yehosef, Jun 30 '13 at 14:20
The link that you give and http://www.yearofmoo.com/2012/11/angularjs-and-seo.html are solutions that focus on using PhantomJS. To me this seems like a hack compared to using server-loaded content (when you don't really need the performance or interactivity of a SPA). Do you of big sites that rely on SEO for traffic and use a SPA approach? — Yehosef, Jun 30 '13 at 14:38
@Yehosef pushState takes a little fooling around but there is are certainly valid apache/nginx rules that make direct links play nice with SEO friendly formats. For older browser, AngularJS uses a `(!,#)` to fallback automatically. But let's say we aren't using an SPA and the content needs to be fully rendered server side before it reaches the user. Now you would be talking about server side templating engines like SmartyPHP or Jade for HTML. — Dan Kanze, Jun 30 '13 at 16:51
@Yehosef The Javascript soup could certainly be organized with AngularJS for event logic like modals and user forms with --- but beware dont rely on directives to format content because that comes after the payload is recieved. For example say you have a directive that turns your all uppercase feed posts too all lowercase. Google will snapshot the all uppercase because the directive after the payload is recieved. `ng-repeate` is another great example --- you cant SEO a "for each feed post" because `ng-repeate` is another directive --- the fallback being a templating engine. — Dan Kanze, Jun 30 '13 at 16:56
@Yehosef Yes the PhantomJS solution is a hack, but it's not flimsy --- it's targetting web crawlers very specifically. Big sites that are SPA and rely on SEO that I can roll off the top of my head: http://linkedin.com or http://apple.com. — Dan Kanze, Jun 30 '13 at 17:04

score 0 · Answer 2 · answered Aug 01 '13 at 12:42

0

The main thing that remains unanswered in Dan's answer is the initial page load. We're still not satisfied with the approach that we have to do it client side - it still seems to us that there is a risk for SEO and initial page load. We have a fair amount of SEO traffic and looking for more - these people are coming to our site with no cache and we have a few seconds to catch them.

There are a few options to handle angular on the server side - I'll try to collect some of them here:

https://github.com/ithkuil/angular-on-server https://github.com/ithkuil/angular-on-server/wiki/Running-AngularJS-on-the-server-with-Node.js-and-jsdom

https://github.com/angular/angular.js/issues/2104

will add more as they come up.

answered Aug 01 '13 at 12:42

Yehosef

17,987
7
35
56

1

If your implimentatin team is having trouble implimenting the SEO solution for SPA's why not try a paid service like http://www.brombone.com/ or even a free service (need to set up your own S3) through http://www.blitline.com/docs/seo_optimizer – Dan Kanze Aug 08 '13 at 21:41
Thanks for the links - great to know about. We're currently transitioning the code that's just for logged in (non-bot) users but it's a problems we're probably going to get to - we'll look into these solutions and see if they are viable. – Yehosef Aug 12 '13 at 10:20
Note their whole pitch is that "google can't crawl your angular site - but we'll scrape if for you so they'll get plain html" It seems to reinforce our concerns. Quote: "can the Googlebot run all that javascript? The answer is: no. But don't worry. The dream is not dead. We have a solution for you." – Yehosef Aug 12 '13 at 10:26
This is of course the same solution offered by google... For SEO, you don't care about if the bot can "run" your javascript. You just care that it can crawl the html. Also, wether or not web devleopers like it, google is in bed with W3 to reform web standards with technologies like AngularJS and Polymer (global API for web frameworks)... in other words, this is the direction the web is taking and theres nothing we can do about it. – Dan Kanze Aug 13 '13 at 14:19
The problem is that google won't spend much time/cpu running all my javascript to see what the page should looks like - these services charge you for that privilege. It's also not clear if you are penalized for having different bot/non-bot traffic - our SEO people say there is a hit but it could be that's just if you're doing keyword stuff - SPAs might not have a problem. My belief is that the majority of SPA apps aren't build for SEO/bots so the entire issue is moot for them. For those that need SEO, I'm skeptical if they will run something as cpu expensive as phantomjs to render my pages. – Yehosef Aug 13 '13 at 15:46
The solution using PhantomJS does not "render" your pages per fetch. It caches them and points web crawlers to this rendered version. Pointing crawlers to different resources than requested is an SEO concern but your SEO team should know the difference between 301/302 redirects and what is right for your implimentation team. – Dan Kanze Aug 13 '13 at 15:54
What I meant by "render" is turn it into HTML. You can use other DOM only engines like Cheerio which will probably be fine as long as the code doesn't make and DOM decisions based on layout/screen size, etc. Phantom is the "best" solution because it will be exactly as it would be on a webkit browser. – Yehosef Aug 14 '13 at 06:41
Obviously we would cache this and we can expire it when needed. The entire discussion here is not how we can "render" the angular as HTML but whether we NEED to. Your claims before were that we don't need to b/c google is smart enough to figure it out - I still maintain that the evidence is the contrary and we need to "render" the content for them to spider it. – Yehosef Aug 14 '13 at 06:46
1

Yes if I was unclear you absolutely need to prerender the content. – Dan Kanze Aug 15 '13 at 03:21
ok - thanks for the clarification. I'm sure as time goes on that may change - but it's valuable to know the current state. – Yehosef Aug 15 '13 at 07:32

Single Page Application SEO and infinite scroll AngularJS

2 Answers2

Linked