4

In-Memory cache stores require serializing/deserializing when storing complex objects, such as a C# POCO object.

Is it not possible to just keep the data to be cached in memory as the object graph, and eliminate this bottleneck? Afterall, the cached and serialized data is still in memory so why not keep the original objects in the memory for the fastest cache possible ( and maybe use Named pipes to implement a distributed cache?)

Thank you

iAteABug_And_iLiked_it
  • 3,725
  • 9
  • 36
  • 75

3 Answers3

2

The caches you mentioned are designed to be used as distributed caches with loads of features and options. Holding an object or especially an object tree in a (global) variable for the use in one single process is always faster than loading it from another computer with the efforts of deserializing it, etc.

But that's not the use case for Redis & Co. As soon as you try to implement your own distributed cache with named pipes (or any other technology) you'll find out that Redis has its right to exist.

In other words: As soon as you move objects over machine boundaries, you'll have to serialize and deserialize them. You might just not know it if the underlying transport protocol does that for you.

Waescher
  • 5,361
  • 3
  • 34
  • 51
  • Thanks. Actually I meant that a distributed cache using named pipes was optional. If the cache was on a single machine ( with no load balancing) then you mean that just keeping everything as an object graph in memory would be faster and simpler than using Redis/memcache? – iAteABug_And_iLiked_it Jan 26 '16 at 19:58
  • 1
    One a single machine in a single process - yes. If you have multiple processes, you have to share your cached objects as well and you won't be able to use the same object references unless you use pipes or .NET remoting or maybe other tech. This might be faster in some cases, however you're heavily limited in comparison with Redis - and you know that your boss is already on his way to tell you that the specs did change ... – Waescher Jan 26 '16 at 20:04
2

Keeping objects as is In-Memory has a few pros and cons. Let's take a look at them

Pros:

  • There will be no serialization and de-serialization costs.
  • Performing analysis (saving serialization costs) will be easier.
    • When performing MapReduce functions in a Distributed Cache, objects are not stored as binary to ease analysis cost.

Cons

  • Serialization actually decreases the data size by removing .NET meta data overheads (RAM is precious)
  • Size of your cache can be predicted
  • Perform encryption or compression if needed.
  • Transfer them over the network
  • Implement custom serializers to choose what to store
  • Binary data has not language barrier (if you have your own serializers)

These are just some of the pros and cons that I think are true. I am sure there will be more

Understanding these pros and cons, most or all of the distributed caches opt for serialization even when the caches are kept in-process. Serialization costs are also not that high when we're considering the most of the use cases.

Also, @Waescher 's point is also true, when network is involved, all type of data must be converted to bytes for transfer.

Basit Anwer
  • 6,742
  • 7
  • 45
  • 88
1

Waescher is right that if there's a single machine and single process, you could store the object graph in local memory. If there are multiple processes, then it's got to be in shared memory and that opens up a whole can of worms that may or may not be addressed by 3rd party products such as Redis/memcached. For example, now concurrency has to be managed (i.e. make sure that one process isn't trying to read the graph at the same time another is modifying the graph, or a more ambitious lock-free algorithm). In fact, this also has to be addressed in a single multi-threaded process.

Object references, in the shared memory case, if they're memory pointers, might still be usable as long as the shared memory segment is mapped to the same address in each process. Depending on the size of the shared memory segment, and the size of the processes and each process' memory map, this may or may not be possible. Using a system-generated object identifier/reference (e.g. a sequentially increasing 4- or 8-byte integer) would obviate that problem.

At the end of the day, if you store the object graph in any repository, it has to be serialized/deserialized into/out of that repository's storage.

Steven Graves
  • 837
  • 6
  • 9