2

I know this is a very vague question, but I'm looking for a very abstract answer. Since I started using GAE a few months ago, I've always shrugged off memcache as not useful and an uncessessary hassle, that it's really not that important. But it seems that memcache is praised as a highly beneficial feature, that Google even says "High performance scalable web applications often use a distributed in-memory data cache in front of or in place of robust persistent storage for some tasks.", and so I thought that there must be something worth looking into about this.

I just don't get it. How is it good for performance? First you have to check if something is in memcache, and if not, query. I always thought it would just be quicker to not have to deal with that, and just query anyway, but it seems this may be a naive approach? How much of a difference does this make?

I guess what I never understood is where memcache is useful. I can see how it can be useful in say the Stackoverflow home page, where all users pretty much see the same thing, so it would be useful, in fact silly not to use memcache in a situation like that. But say a social network like Facebook. Every user sees something different. No two people see the same data and content, and things change so fast that memcache would probably constantly need to be updated. What role can memcache play in such a scenario?

Also, in a private website like a social network, how much can memcache really fit if every user has to store different information in memcache? I know GAE doesn't speak of the size of its memcache, so would it be safe to store hundreds of thousands of records?

Snowman
  • 31,411
  • 46
  • 180
  • 303
  • 1
    "memcache" = in-memory cache. Memory access can be orders of magnitude faster than making a query to a database, whose data is probably on disk. The cache, by nature, will acquire objects that are accessed most often. – Jonathon Reinhart Oct 25 '12 at 04:03
  • I recommend you to use this term on Google "memcached at facebook" You will find a lot of good information. – sahid Oct 25 '12 at 08:52

2 Answers2

7

You use memcache for things you're likely to need frequently. Checking to see if something is in memcache is fast. Getting it from memcache is fast. If you can save yourself having to do a query, you save a ton of time.

For instance, consider the following two scenarios:

  • Two people request your homepage. Both requests do a query as part of the page load.
  • Two people request your homepage. Both check the cache; one does a query and stores it, the other gets the cached results.

Ballpark time to get something from memcache (or store it) is on the order of ~2-3ms. Ballpark time to get something from the DB is 100ms.

So for the first scenario, you have 100ms x2 = 200ms total.

In the second scenario, you have 3ms (failed lookup) + 100ms (query) + 3ms (store) + 3ms (successful lookup) = 109ms total.

You've saved almost 50% overall.

Now consider that maybe 10 people request your homepage. Each additional person in the first scenario would be another 100ms. Each person in the second scenario is only 3ms.


Also note that you don't have to store an entire page at a time in memcache. You can store parts of pages as well. Sure, not all users have all the same data, but there are certainly some things that are shared between them.

Amber
  • 507,862
  • 82
  • 626
  • 550
  • I know there are no concrete numbers, but ballpark estimate, how large is memcache, or how large is it typically? Like 1 mb? or 100mb? A gigabyte? (for GAE). – Snowman Oct 25 '12 at 04:23
  • http://stackoverflow.com/questions/2175586/how-much-memory-of-memcache-is-available-to-a-google-app-engine-account – Amber Oct 25 '12 at 04:24
  • Hmmm..I'm just having a hard time getting a mental picture. What is at least the right unit of measurement? For some reason I compare it to a standard personal computer hard drive cache, which I always see to be 2mb or 8mb..are we talking about the same here? – Snowman Oct 25 '12 at 04:27
  • Individual memcache values are limited to 1mb *each*. You can have many values in memcache. – Amber Oct 25 '12 at 04:47
  • @mohabitar memcache resides in RAM which nowadays is gigabytes-large, the 2mb or 8mb of cache you're referring to is on the cpu. – 1in9ui5t Oct 25 '12 at 21:02
1

Size

You can't rely on the size of memcache. There have been various attempts at figuring out how much data each app has, and results vary.

Reliability

It's not reliable. Your memcache entries' lifetime is related to a set of black-box (to users) values, for example how often they're requested and how long it's been since the last request.

Google will try to keep oft-used entries around, and will randomly (from your perspective) remove them from memcache.

Tips

Try to use it on items that are shared for all, or requested often, or expensive to calculate. Good examples are user entities, since only a user is loaded they'll likely interact with your app more than once. Another might be a template in which you put user-specific data, or the full page if only one small part of it will change. For Facebook it might be entire templates, sections such as "these friends are on chat", company-wide segments such as "you might like to try this game", or a new post that will be pushed to many currently-online friends.

Use different cache periods for each type of entity. Hint to App Engine which are the most useful to keep around longer, by setting one entity's expiry date shorter than the other. A cached page might only be useful for 30 seconds, a user entry might be useful for an hour.

It's a limited resource and Google will optimize across multiple customers/applications for the benefit of all customers. If your app doesn't get hit and another needs more memcache, it would be rational for Google to eject your memcached entries.

Richard Watson
  • 2,584
  • 1
  • 23
  • 30