3

I'm currently able to store an object I've created into HttpContext.Current.Session, and I've come across protobuf-net. Is there a way to store my object by serializing it with protobuf?

It looks like protobuf wants to store the information into a Stream, so should I (can I?) store a Stream object into the users session? Or should I first convert it from a Stream into another object type? If so, will converting the serialized object circumvent the original purpose of using protobuf (cpu usage, memory usage)? Has anyone done this before?

My goal is to use protobuf as a compression layer for storing information into the users session. Is there a better way (smaller sizes, faster compression, easier to maintain, smaller implementation overhead) of doing this, or is protobuf the right tool for this task?


Update

I'm using this class object

[Serializable]
public class DynamicMenuCache
{
    public System.DateTime lastUpdated { get; set; }
    public MenuList menu { get; set; }
}

This class is a wrapper for my MenuList class, which is (basically) a List of Lists containing built-in types (strings, ints). I've created the wrapper to associate a timestamp with my object.

If I have a session cache miss (session key is null or session.lastUpdated is greater than a globally stored time), I do my normal db lookup (MSSQL), create the MenuList object, and store it into the session, like so

HttpContext.Current.Session.Add("DynamicMenu" + MenuType, new DynamicMenuCache()
{
    lastUpdated = System.DateTime.Now,
    menu = Menu
});

Currently our session is stored in memory, but we might move to a DB session store in the future.

Our session usage is pretty heavy, as we store lots of large objects into it (although I hope to cleanup what we store in the session at some future point).

For example, we store each user's permission set into their session store to avoid the database hit. There are lots of permissions and permission storing structs that get stored into the session currently.

At this point I'm just viewing the options available, as I'd like to make more intelligent and rigorous use of the session cache in the future.

Please let me know if there is anything else you need.

JesseBuesking
  • 6,496
  • 4
  • 44
  • 89
  • I guess it depends on the complexity/size of your state object, the more the object is big, the more you'll benefit from protobuf. For simple objects it could be a penality to have double serialization. edit: you could benchmark a memcached + protobuff implementation vs a classic asp.net session state server for different session objects, it would be interresting :) – Guillaume86 Dec 13 '11 at 17:25
  • Hi JesseB - what is your current setup? What session provider are you using currently? In-memory? SQL? Or...? Also: how extensive is your session usage? Also, what sorts of things are you storing in session? (so I can give the most appropriate guidance) – Marc Gravell Dec 13 '11 at 18:01
  • just noticed the update (note: I don't get automatic notifications - you might want to add a `@marc` comment so I see it ;p). Kinda late here and pretty tired - can I promise to read this tomorrow and respond then? – Marc Gravell Dec 13 '11 at 21:56
  • @MarcGravell Whoops my bad ;p Yeah that's cool. I'm still in the `research phase`, so nothing will be set in stone for a while. I'm just shopping around for something that'll work well. – JesseBuesking Dec 13 '11 at 22:01

1 Answers1

4

Note that using protobuf-net here mainly only makes sense if you are looking at moving to a persisted state provider at some point.

Firstly, since you are using in-memory at the moment (so the types are not serialized, AFAIK), some notes on changing session to use any kind of serialization-based provider:

  • the types must be serializable by the provider (sounds obvious, but this has particular impact if you have circular graphs, etc)
  • because data is serialized, the semantic is different; you get a copy each time, meaning that any changes you make during a request are lost - this is fine as long as you make sure you explicitly re-store the data again, and can avoid some threading issues - double-edged
  • the inbuilt state mechanisms typically retrieve session as single operation - which can be a problem if (as you mention) you have some big objects in there; nothing to do with protobuf-net, but I once got called in to investigate a dying server, which turned out to be a multi-MB object in state killing the system, as every request (even those not using that piece of data) caused this huge object to be transported (both directions) over the network

In many ways, I'm actually simply not a fan of the standard session-state model - and this is before I even touch on how it relates to protobuf-net!

protobuf-net is, ultimately, a serialization layer. Another feature of the standard session-state implementation is that because it was originally written with BinaryFormatter in mind, it assumes that the objects can be deserialized without any extra context. protobuf-net, however, is (just like XmlSerializer, DataContractSerializer and JavaScriptSerializer) not tied to any particular type system - it takes the approach "you tell me what type you want me to populate, I'll worry about the data". This is actually a hugely good thing, as I've seen web-servers killed by BinaryFormatter when releasing new versions, because somebody had the audacity to touch even slightly one of the types that happened to relate to an object stored in persisted session. BinaryFormatter does not like that; especially if you (gasp) rename a type, or (shock) make something from a field+property to an automatically-implemented-property. Hint: these are the kinds of problems that google designed protobuf to avoid.

However! That does mean that it isn't hugely convenient to use with the standard session-state model. I have implemented systems to encode the type name into the stream before (for example, I wrote an enyim/memcached transcoder for protobuf-net), but... it isn't pretty. IMO, the better way to do this is to transfer the burden of knowing what the data is to the caller. I mean, really... the caller should know what type of data they are expecting in any given key, right?

One way to do this is to store a byte[]. Pretty much any state implementation can handle a BLOB. If it can't handle that, just use Convert.ToBase64String / Convert.FromBase64String to store a string - any implementation not handling string needs shooting! To use with a stream, you could do something like (pseudo-code here):

public static T GetFromState<T>(string key) {
    byte[] blob = {standard state provider get by key}
    using(var ms = new MemoryStream(blob)) {
        return Serializer.Deserialize<T>(ms);
    }
}

(and similar for adding)

Note that protobuf-net is not the same as BinaryFormatter - they have different expectations of what is reasonable, for example by default protobuf-net expects to know in advance what the data looks like (i.e. public object Value {get;set;} would be a pain), and doesn't handle circular graphs (although there are provisions in place to support both of these scenarios). As a general rule of thumb: if you can serialize your data with something like XmlSerializer or DataContractSerializer it will serialize easily with protobuf-net; protobuf-net supports additional scenarios too, but doesn't make an open guarantee to serialize every arbitrary data model. Thinking in terms of DTOs will make life easier. In most cases this isn't a problem at all, since most people have reasonable data. Some people do not have reasonable data, and I just want to set expectation appropriately!

Personally, though, as I say - especially when large objects can get involved, I'm simply not a fan of the inbuilt session-state pattern. What I might suggest instead is using a separate per-key data store (meaning: one record per user per key, rather than just one record per user) - maybe just for the larger objects, maybe for everything. This could be SQL Server, or something like redis/memcached. This is obviously a bit of a pain if you are using 3rd-party controls (webforms etc) that expect to use session-state, but if you are using state manually in your code, is pretty simple to implement. FWIW, BookSleeve coupled to redis works well for things like this, and provides decent access to byte[] based storage. From a byte[] you can deserialize the object as shown above.

Anyway - I'm going to stop there, in case I'm going too far off-topic; feel free to ping back with any questions, but executive summary:

  • protobuf-net can stop a lot of the versioning issues you might see with BinaryFormatter
  • but it isn't necessarily a direct 1:1 swap, since protobuf-net doesn't encode "type" information (which the inbuilt session mechanism expects)
  • it can be made to work, most commonly with byte[]
  • but if you are storing large objects, you may have other issues (unrelated to protobuf-net) related to the way session-state wants to work
  • for larger objects in particular, I recommend using your own mechanism (i.e. not session-state); the key-value-store systems (redis, memcached, AppFabric cache) work well for this
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Awesome explanation! I have a few questions after reading this and then doing a trial with my code. What techniques should I be aware of for improving ser/deser time? I can see protobuf eating up large (relative) amounts of time on seemingly random occasions. I've read [this](http://stackoverflow.com/a/2970411/435460), but I'm going to have to call that on every page load. That sounds like it's meant for when you'll repetivitely call ser/deser back-to-back. Also, protobuf is using less memory as well, but can you give any tips for maximizing this reduction? Thanks in advance! You rock! – JesseBuesking Dec 14 '11 at 14:55
  • @JesseB you don't need to call that on every page load; just once - and even that is optional (in both v1 and v2 it will compile-on-first-demand by default). "Maximising the reduction" depends on the data, I'm afraid - using "packed" encoding for lists of primitives can help (see: `IsPacked` on `[ProtoMember]`), and using "grouped" encoding on sub-objects (either individual or list) can help with *serialize* time. But to give a specific answer I'd need a very specific scenario. Otherwise it is akin to "how can I make my C# faster?" – Marc Gravell Dec 14 '11 at 15:04
  • I just read through http://code.google.com/apis/protocolbuffers/docs/encoding.html which helps to make sense of how the information is encoded. I did notice that after first use the overhead seemed to go away, so it makes sense that it's compiling on first use. Thanks again for helping me make sense of all of this! Thank you also for porting this awesome serializer! You deserve at least a high-five, or perhaps something more badass ;p – JesseBuesking Dec 14 '11 at 15:58