It sounds as though you'd like to build a scalable backend infrastructure that ultimately will be used to do the following:
- Serve content. This is the web server layer.
- Perform some type of back end processing for user requests
coming in from the web server layer and communicate with the data store. Call this the application server layer.
- Save session state and user data in a distributed, fault tolerant, eventually consistent key value store.
Also, it sounds as though you want to do this using commodity PC hardware.
This is a tall order.
Foursquare uses Scala with the Lift framework, jetty for their web server. Here's more. And more.
Facebook uses many different technologies. I know that for their data store they use HBase (they were using Cassandra)
Yahoo uses HBase to keep track of user statistics.
Twitter started as a Ruby-backend web site. They moved to Scala. Twitter is incrementally moving from mysql (I assume sharded) to Cassandra using their proprietary incremental database conversion tool.
As far as scaling on the application server and web server end, I know that what really counts is having a language that has the ability to spawn new user processes in user space and a manager process that assigns new worker processes the requests coming in. Think of it as running a very efficient company. The more work you've got coming in, the more people you hire. This is the Actor model. Some languages have actors built in,(erlang) others have actors implemented as frameworks(akka) or libraries (Scala native). Apparently, Scala's native actors are buggy so some people got together and implemented the akka framework for Scala and Java. There's a lot of discussion online regarding actors and which language and libraries one should use. Erlang has a lot going for it out of the box, however, Scala runs in the JVM and allows you to reuse a lot of the existing Java web libraries (which could have some issue if they happen to have static objects declared in them) Erlang has actors and the OTP libraries, but apparently does not have the rich libraries that Java has. So, for me it really boils down to Scala (with akka) or Erlang.
For the web server, with Scala, you can use any java app server. Foursquare uses jetty for most things. It's not written in Scala, but since Scala compiles down to bytecode that runs on the JVM it easily interops with any java app server.
People also say that there aren't that many Erlang programmers and that Erlang is harder to learn (functional programming vs imperative programming) Scala is functional and imperative at the same time (meaning you can do either)
Erlang is functional. Now, functional programming has a lot of things going for it as one expert functional programmer can get a lot more done than an expert imperative programmer. Yahoo stores was originally written and maintained in Lisp (functional language) by one man. On the other hand, imperative programming is easier to learn and used widely in a team setting. Imperative languages are good for some things, functional languages for
others. The right tool for the right job.
Back to the web server discussion, with Erlang, you can use yaws or you can run a framework (Chicago Boss)
Here's more on the Scala vs Erlang debate.
Another link.
More here.
And another.
Another opinion.
On the database end, you have a lot of choices. See here.
You can even eschew the database all-together and save your data in mnesia (Erlang's runtime data store)
My answer is not complete as this topic (scaling app servers, databases and web servers) is very complicated and full of debate. Some frameworks even blur the tiers (web server, application server, database) distinction and integrate a lot of the functionality of these layers within the framework itself.