20

My model has different entities that I'd like to calculate once like the employees of a company. To avoid making the same query again and again, the calculated list is saved in Memcache (duration=1day).. The problem is that the app is sometimes giving me an error that there are more bytes being stored in Memcache than is permissible:

Values may not be more than 1000000 bytes in length; received 1071339 bytes

Is storing a list of objects something that you should be doing with Memcache? If so, what are best practices in avoiding the error above? I'm currently pulling 1000 objects. Do you limit values to < 200? Checking for an object's size in memory doesn't seem like too good an idea because they're probably being processed (serialized or something like that) before going into Memcache.

Glenn
  • 8,932
  • 2
  • 41
  • 54
David Haddad
  • 3,796
  • 8
  • 32
  • 40
  • The first time I read the title of the question I thought that Memcache here can only store 1M as in 1 million values. Can the title be changed to "Avoiding Memcache 1MB limit of values"? – Ronnie Beltran Jun 30 '15 at 07:00

4 Answers4

29

David, you don't say which language you use, but in Python you can do the same thing as Ibrahim suggests using pickle. All you need to do is write two little helper functions that read and write a large object to memcache. Here's an (untested) sketch:

def store(key, value, chunksize=950000):
  serialized = pickle.dumps(value, 2)
  values = {}
  for i in xrange(0, len(serialized), chunksize):
    values['%s.%s' % (key, i//chunksize)] = serialized[i : i+chunksize]
  return memcache.set_multi(values)

def retrieve(key):
  result = memcache.get_multi(['%s.%s' % (key, i) for i in xrange(32)])
  serialized = ''.join([v for k, v in sorted(result.items()) if v is not None])
  return pickle.loads(serialized)
Eyal Levin
  • 16,271
  • 6
  • 66
  • 56
Guido van Rossum
  • 16,690
  • 3
  • 46
  • 49
  • 1
    Thanks Guido for the sketchout, it gives me the right basis to move forward. – David Haddad Feb 06 '12 at 16:01
  • I had a similar problem as @Nikolay, the pickle.loads line was failing because the values were coming back in different order than they had been pickled. I used serialized_data = [result[key] for key in keys if key in result and result[key] is not None] to solve my issue. – John Mar 31 '14 at 12:51
  • I am having trouble understanding this code completely. May I ask someone to explain what it does step-by-step? – hyang123 Jul 17 '14 at 15:47
  • 1
    The problem is that when you want to store 10 key/values for example, Memcached only store the last 4 key/values (in the list). So, when I try to retrieve the values, It only returns 4 keys/value (miss count=6) which is not enough for reconstructing the serialized object/list. – Ashkan Aug 05 '14 at 16:22
9

I frequently store objects with the size of several megabytes on the memcache. I cannot comment on whether this is a good practice or not, but my opinion is that sometimes we simply need a relatively fast way to transfer megabytes of data between our app engine instances.

Since I am using Java, what I did is serializing my raw objects using Java's serializer, producing a serialized array of bytes. Since the size of the serialized object is now known, I could cut into chunks of 800 KBs byte arrays. I then encapsulate the byte array in a container object, and store that object instead of the raw objects.

Each container object could have a pointer to the next memcache key where I could fetch the next byte array chunk, or null if there is no more chunks that need to be fetched from the memcache. (i.e. just like a linked list) I then re-merge the chunks of byte arrays into a large byte array and deserialize it using Java's deserializer.

Ibrahim Arief
  • 8,742
  • 6
  • 34
  • 54
  • Thanks Ibrahim, it's definitely a creative solution. I'm looking to see what simpler more standard approaches there are for storing object lists in memcache. – David Haddad Feb 03 '12 at 11:57
5

Do you always need to access all the data which you store? If not then you will benefit from partitioning the dataset and accessing only the part of data you need.

If you display a list of 1000 employees you probably are going to paginate it. If you paginate then you definitely can partition.

You can make two lists of your dataset: one lighter with just the most essential information which can fit into 1 MB and other list which is divided into several parts with full information. On the light list you will be able to apply the most essential operations for example filtering through employees name or pagination. And then when needed load the heavy dataset you will be able to load only parts which you really need.

But well these suggestions takes time to implement. If you can live with your current design then just divide your list into lumps of ~300 items or whatever number is safe and load them all and merge.

Ski
  • 14,197
  • 3
  • 54
  • 64
  • 1
    Spot on ideas Skirmantas! You're right they will take implementation time but they're worth looking into. Thanks for sharing your advice. – David Haddad Feb 03 '12 at 14:18
3

If you know how large will the objects be you can use the memcached option to allow larger objects:

memcached -I 10m

This will allow objects up to 10MB.

danius
  • 2,664
  • 27
  • 33
  • 2
    Thanks. Useful to know, but it appears this is not an option on AppEngine which is tagged as the platform of the original question: https://cloud.google.com/appengine/docs/python/memcache/#Python_Limits – Michael Jun 22 '15 at 13:52