5

I have just been reading about how BigMemory allows Java systems to scale up rather than out.

About BigMemory:

BigMemory gives Java applications instant, effortless access to a large memory footprint, free of the constraints of garbage collection.

BigMemory is pure Java and provides an in-process, off-heap cache that lets you store large amounts of data—up to a terabyte—closer to your application.

This breakthrough solution improves memory utilization and application performance with both standalone and distributed caching.

So how do I do the same with .net, e.g. in-process, off-heap cache. (Note the Asp.net cache is on the garbage collected heap)

Ian Ringrose
  • 51,220
  • 55
  • 213
  • 317
  • 1
    The CLR already has this feature built-in, it is called the Large Object Heap. – Hans Passant Aug 25 '11 at 11:55
  • @Ian Exactly what do you want to do that you can't already do? What problem are you looking to solve? – Tim Lloyd Aug 25 '11 at 12:14
  • Look at memcached or Velocity [Here is an SO question to start with][1] [1]: http://stackoverflow.com/q/397824/157224 – adrianm Aug 25 '11 at 12:17
  • @adrianm The OP wants something that is "in-process". – Tim Lloyd Aug 25 '11 at 12:17
  • 2
    @Hans, the large object heap is still gargage collected and has to be scanned by a full GC. It also only holds **large** objects. Try putting 128GB of data in the Large Object Heap and see what it does to the GC... – Ian Ringrose Aug 25 '11 at 12:18
  • @Ian I think there is a maximum size for .NET objs of around 2GB each (http://stackoverflow.com/questions/2338778/what-is-the-maximum-length-of-an-array-in-net-on-64-bit-windows/2338797#2338797 ) (but each obj can reference other objs). I think that if you serialize your objects to binary types you won't have too much problems, but I don't have 128GB of memory to test it :-) – xanatos Aug 25 '11 at 12:20
  • You could consider serializing and deserializing your objects and putting on them on the unmanaged heap (which would be in-process) if you want manual memory management. The performance could be terrible if you have to do this frequently though... – Tim Lloyd Aug 25 '11 at 12:24
  • @chibacity Is accessing unmanaged memory slow in C#? I thought it was only using unmanaged methods that was slow. – xanatos Aug 25 '11 at 12:30
  • @xantos It's the "serializing and deserializing" that's the issue. – Tim Lloyd Aug 25 '11 at 12:32
  • @chibacity It's my impression that BigMemory works with serialized objects. Am I wrong? – xanatos Aug 25 '11 at 16:27
  • @xanatos I am making a general observation. Serialization and deserialization are relatively slow. But yes, BigMemory does indeed use this technique for shuttling objects back and forth from an unmanaged heap. I have given an answer below which displays rudimentary aspects of this, working with an in-process unmanaged heap. – Tim Lloyd Aug 25 '11 at 16:36

2 Answers2

8

No, there is not a BigMemory system for .Net (i.e. an in-process non-GC heap memory manager), however, you could roll your own.

You could utilize an unmanaged heap to have a non-garbage collected in-process heap, however, if you are working with objects rather than raw memory, you'll have to serialize and deserialize them which is slow.

You'll need to keep a lookup of heap infos so you can retrieve your objects, this obviously has its own memory overhead, so not suitable for a huge amount of very small objects as:

a. A lot of memory will be taken up by management objects.
b. The GC will go berserk scanning the management objects.

If the objects are large enough and there's not too many of them, this could work for you.

However, you could push some of the management info into the unmanaged heap too. There are lots of optimization opportunities.

This can all be wrapped up to work like a key\value cache, thus abstracting the heap infos and heap.

Updated

Updated sample code to use Protobuf which does binary serialization significantly faster than .Net. This simple sample can Put+Get 425k objects per second, with a key\value wrapper. Your millage will vary depending on object size\complexity.

Object size is stored in unmanaged heap to reduce memory consumption on managed heap.

...
...
using ProtoBuf;

[TestFixture]
public class UnmanagedHeap
{
    [Test]
    public void UnmanagedHeapAccess()
    {
        const int Iterations = 425 * 1000;
        const string Key = "woo";

        Bling obj = new Bling { Id = -666 };
        Cache cache = new Cache();
        Stopwatch sw = Stopwatch.StartNew();

        for (int i = 0; i < Iterations; i++)
        {
            cache.Put(Key, obj);

            obj = cache.Get<Bling>(Key);
        }

        cache.Remove(Key);

        Console.WriteLine(sw.Elapsed.TotalMilliseconds);
    }

    [DataContract]
    public class Bling
    {
        [DataMember(Order = 1)]
        public int Id { get; set; }
    }

    public class Cache
    {
        private const int SizeFieldWidth = 4;

        private readonly Dictionary<string, IntPtr> _lookup = new Dictionary<string, IntPtr>();

        public void Put(string key, object obj)
        {
            IntPtr oldPtr = _lookup.TryGetValue(key, out oldPtr) ? oldPtr : IntPtr.Zero;

            IntPtr newPtr = SerializeToHeap(obj, oldPtr);

            _lookup[key] = newPtr;
        }

        public T Get<T>(string key)
        {
            IntPtr ptr = _lookup[key];

            return DeserializeFromHeap<T>(ptr);
        }

        public void Remove(string key)
        {
            IntPtr ptr = _lookup[key];

            Marshal.FreeHGlobal(ptr);

            _lookup.Remove(key);
        }

        private static IntPtr SerializeToHeap(object obj, IntPtr oldPtr)
        {
            using (MemoryStream ms = new MemoryStream())
            {
                Serializer.Serialize(ms, obj);
                byte[] objBytes = ms.GetBuffer();
                int newSize = (int)ms.Length;
                bool requiresAlloc = true;

                if (oldPtr != IntPtr.Zero)
                {
                    int oldSize = GetObjectSize(oldPtr);

                    requiresAlloc = (oldSize != newSize);
                }

                IntPtr newPtr = requiresAlloc ? Marshal.AllocHGlobal(newSize + SizeFieldWidth) : oldPtr;

                byte[] sizeField = BitConverter.GetBytes(newSize);
                Marshal.Copy(sizeField, 0, newPtr, SizeFieldWidth);
                Marshal.Copy(objBytes, 0, newPtr + SizeFieldWidth, newSize);
                return newPtr;
            }
        }

        private static T DeserializeFromHeap<T>(IntPtr ptr)
        {
            int size = GetObjectSize(ptr);
            byte[] objBytes = new byte[size];
            Marshal.Copy(ptr + SizeFieldWidth, objBytes, 0, size);

            using (MemoryStream ms = new MemoryStream(objBytes))
            {
                return Serializer.Deserialize<T>(ms);
            }
        }

        private static int GetObjectSize(IntPtr ptr)
        {
            byte[] sizeField = new byte[SizeFieldWidth];
            Marshal.Copy(ptr, sizeField, 0, SizeFieldWidth);
            int size = BitConverter.ToInt32(sizeField, 0);
            return size;
        }
    }
}
Tim Lloyd
  • 37,954
  • 10
  • 100
  • 130
  • For anyone checking this out in the present, I would recommend looking at Microsoft Bond instead of Protobuf – Chris Marisic Apr 27 '15 at 21:29
  • 1
    I think you have a unmanaged memory leek any time you call `put` twice with the same key with a object that has `requiresAlloc == true`. You need a `if (requiresAlloc) Marshal.FreeHGlobal(oldPtr)` inside the `if (oldPtr != IntPtr.Zero)` block. Also you have a nasty memory leek if a `Cache` is not empty and you let it go out of scope, however if you switched your `IntPtr` to a `SafeHandle` that called `Marshal.FreeHGlobal` on cleanup that would fix that (making it disposable and having the dispose empty the cache would be a nice feature too). – Scott Chamberlain Aug 09 '15 at 04:48
1

Yes there is 100% managed code. ProtoBuf suggested in the above answer will not give you 100% transparency as it does not map polymorphic ereferences and cycles properly + requires special attributes. NFX Pile does not require anything other than [Serializable]

https://github.com/aumcode/nfx https://github.com/aumcode/nfx/blob/master/Source/NFX/ApplicationModel/Pile/IPile.cs

https://github.com/aumcode/nfx/blob/master/Source/NFX/ApplicationModel/Pile/ICache.cs

see videos: https://www.youtube.com/watch?v=IUBF2Ncvbbs

https://www.youtube.com/watch?v=Dz_7hukyejQ

Apache 2.0

itadapter DKh
  • 596
  • 3
  • 7