8

I want to use the #! token to make my GWT application crawlable, as described here: http://code.google.com/web/ajaxcrawling/

There is a GWT sample app available online that uses this, for example: http://gwt.google.com/samples/Showcase/Showcase.html#!CwRadioButton

Will serve the following static webpage to the googlebot: http://gwt.google.com/samples/Showcase/Showcase.html?_escaped_fragment_=CwRadioButton

I want my GWT app to do something similar. In short, I'd like to serve a different flavor of the page whenever the _escaped_fragment_ parameter is found in the URL.

What should I modify in order for the server to serve something else (a static page, or a page dynamically generated through a headless browser like HTML Unit)? I'm guessing it could be the web.xml file, but I'm not sure.

(Note: I thought of checking the Showcase app provided with the GWT SDK, but unfortunately it doesn't seem to support serving static files on _escaped_fragment_ and it doesn't use the #! token..)

Philippe Beaudoin
  • 3,290
  • 1
  • 22
  • 25
  • Cross-posted on GWT Google Group. You might want to read the answers there too: http://groups.google.com/group/google-web-toolkit/browse_thread/thread/15a922e701e9e2db?hl=en – Philippe Beaudoin Mar 12 '10 at 16:49
  • I've posted a question to spark more discussion around this topic. "Making AJAX Applications Crawlable? How to build a simple web service on Google App Engine to produce HTML Snapshots?" http://stackoverflow.com/questions/3517944/making-ajax-applications-crawlable-how-to-build-a-simple-web-service-on-google-a – Chris Jacob Aug 19 '10 at 01:12

2 Answers2

2

If you want to use web.xml, then I think it won't work with a servlet-mapping, because the url-patterns ignore the get parameters. (Not 100% sure, if there is another way to make this possible.)

You could of course map Showcase.html to a servlet, and in that servlet decide what to do, based on the get parameter "_escaped_fragment_". But it's a little bit expensive to call a Servlet just to serve a static page for the majority of the requests (not too bad, but still. You could set cache headers, if you're sure that it doesn't change).

Or you could have an Apache or something in front of your server - but I understand, I wouldn't like to have to do that either. Maybe your JavaEE server (which one are you using BTW?) provides some mechanism for URL filtering before the request gets passed on to the web container - I'd like to know that, too!

Chris Lercher
  • 37,264
  • 20
  • 99
  • 131
  • Thanks for the insightful comment. I'm running my project on Google App Engine so I'm not sure how much access (or even which) servlet container I'm using... I could probably find out, though. After some reading, I was thjinking that filters could do the trick, if only they receive the get parameter. I'll check it out. – Philippe Beaudoin Mar 12 '10 at 16:46
  • Using a Filter is pretty similar to using a Servlet in this case (I don't think, that there's much of a performance difference). In any case, you should be able to retrieve the get parameter via servletRequest.getParameter(...) – Chris Lercher Mar 12 '10 at 17:54
  • Makes sense. However, the filter makes it possible to continue to the rest of the filter chain if I don't want to handle the request (i.e. the _escaped_fragment_ parameter is not present). Is there a way to do the same with a servlet? – Philippe Beaudoin Mar 12 '10 at 22:48
2

Found my answer! The Showcase sample supporting crawlable hyperlinks is in the following branch: http://code.google.com/p/google-web-toolkit/source/browse/branches/crawlability/samples/showcase/?r=7726

It defines a filter in the web.xml to redirect URLs with the _escaped_fragment_ token to the output of HTML Unit.

user988346
  • 1,799
  • 3
  • 16
  • 15
Philippe Beaudoin
  • 3,290
  • 1
  • 22
  • 25
  • Just to comment on the above. This would probably work if you use your own Tomcat. For my part, I'm running on Google App Engine. The problem right now is that HTML Unit doesn't run on App Engine, but it might do so soon, according to: http://www.google.com/url?sa=D&q=https://sourceforge.net/tracker/index.php%3Ffunc%3Ddetail%26aid%3D2962074%26group_id%3D47038%26atid%3D448269%23&usg=AFQjCNGGJuWPDqFfUuc4k44HormgSgEM6g – Philippe Beaudoin Mar 14 '10 at 21:12
  • that branch no longer exists, ie, this is no longer a helpful, much less correct, answer. – antony.trupe Jan 26 '11 at 03:28