3

I'm quite interested in the Facebook's BigPipe technique for improving user experience when displaying web pages. The downside is that this is heavily Javascript-based, and not at all search-engine friendly.

When developing a similar technique on my own website, I designed it so it can very easily be disabled server-side to serve more standard pages, without BigPipe enabled. Now, I'm looking for a way to make it crawler-friendly.

  • The easy way would be to serve non-BigPipe content to search engine crawlers / bots, and pipelined content to the rest. This should not be considered as cloaking : the content is exactly the same, the layout is the same (after BigPipe's javascript has been executed). The only thing that changes it the way it is delivered, to make it more crawler-friendly. But will Google see this as legitimate?

  • The second way would be to use another Javascript to solve this problem. On the first request, send non-BigPipe page, that includes some Javascript that will save some cookie. On subsequent requests, send BigPipe content only if the cookie is presented. Very fist page load will not be optimized, but the other will. Looks like a great solution, but I don't really like multiplying cookies.

  • The third way would be to stream BigPipe content not using HTML comments as Facebook does, but using <noscript> tags. This would make a pagelet look like :

    <noscript id="pagelet_payload_foo">Some content to be indexed here</noscript>
    <script>onPageletArrive({id:'foo', [...]})</script>
    

    instead of the Facebook's approach:

    <code id="pagelet_payload_foo"><!-- Some content to be indexed here --></code>
    <script>onPageletArrive({id:'foo', [...]})</script>
    

    This looks great, simple, both crawler friendly and user friendly. But this seems a little hackish to me, and does not work in IE 7/8 because the contents of the noscript tag is ignored in the DOM. That would involve some dirty special case for these browsers.

Then, I looked more closely at what Facebook does. Seems like they are doing the same. Pages are optimized in my browser, but are not in Google's cache. I tried to clear all my browser cache and cookies, and requested the page again. No matter what, I keep getting the content through BigPipe. They are not using any cookie-based technique.

Then, the question is simple : How does Facebook do that? Would the first method be considered as cloaking, or does it only work for Facebook because it is Facebook? Or did I miss something else?

Thanks.

Pierre
  • 6,084
  • 5
  • 32
  • 52
  • I just NEED to ask you something: How have you achieved the "I designed it so it can very easily be disabled"? – Martin. May 07 '12 at 02:29
  • Very simple: when I don't want to enable BigPipe, the page contents are rendered inline instead of being deferred at the end of page generation. The template tag I created has simply two different behaviors. – Pierre May 13 '12 at 06:54
  • So you simply do something like `if ($enabled && !$not_closed_html) { echo "

    "; echo bigPipe($html); } elseif ($enabled) { echo bigPipe($html) }` ?

    – Martin. May 13 '12 at 20:02

1 Answers1

2

The easy answer is that Facebook discriminate search bots and serve them different content. That can be via the user agent (as I think you're implying in your Q) or by looking up the IP address to see if it matches a Google address range.

The fully static version would be my preference, because it also permits you to optimise for speed, something that Google (and perhaps others) include in its indexing.

Phil H
  • 19,928
  • 7
  • 68
  • 105
  • You mean not using BigPipe at all? BigPipe is a speed optimization : it allows the user to start interacting with the page and loading ressources promptly. Disabling it has almost no impact on page generation time, but has a significant impact on the perceived speed of the page (even when including JavaScripts at the end of the page, the browser has to wait for all the content to be rendered (even least importants parts, like sidebar, or ads) before starting to load scripts, and enabling the user to interact with the page. And... isn't the IP / UA approach considered as cloaking by Google? – Pierre Feb 10 '12 at 09:53
  • It's cloaking only if a site serves different content to a bot and to a normal user – FelipeAls Aug 18 '12 at 05:41