Getting around the 2 GB collection limit in .NET

Question

From this question, I thought I could get around the 2 GB collection size limit by creating a BigList datatype using the following pattern (and by the way, this limit seems to be imposed by default on x86 applications, if you are curious about trying it out):

using Microsoft.Win32;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace RegistryHawk
{
    class Program
    {

        struct RegistryPath
        {
            public RegistryView View;
            public string Path;
            public bool IsKey;
            public RegistryValueKind ValueKind;
            public string ValueName;
            public object Value;
            public int HashValue;
        }

        public class BigList<T>
        {
            object listLock = new object();
            List<List<T>> Items = new List<List<T>>();
            int PageSize = 1000000; // Tweak this to be the maximum size you can grow each individual list before reaching the 2 GB size limit of .NET.
            public ulong Count = 0;
            int listCount = 0;

            public BigList()
            {
                Items.Add(new List<T>());
            }

            public void Add(T item)
            {
                lock (listLock)
                {
                    if (Items[listCount].Count == PageSize)
                    {
                        Items.Add(new List<T>());
                        listCount++;
                    }
                    Items[listCount].Add(item);
                    Count++;
                }
            }
        }

        static void Main(string[] args)
        {
            BigList<RegistryPath> snapshotOne = new BigList<RegistryPath>();
            WalkTheRegistryAndPopulateTheSnapshot(snapshotOne);
            BigList<RegistryPath> snapshotTwo = new BigList<RegistryPath>();
            WalkTheRegistryAndPopulateTheSnapshot(snapshotTwo);
        }

        private static void WalkTheRegistryAndPopulateTheSnapshot(BigList<RegistryPath> snapshot)
        {
            List<ManualResetEvent> handles = new List<ManualResetEvent>();
            foreach (RegistryHive hive in Enum.GetValues(typeof(RegistryHive)))
            {
                foreach (RegistryView view in Enum.GetValues(typeof(RegistryView)).Cast<RegistryView>().ToList().Where(x => x != RegistryView.Default))
                {
                    ManualResetEvent manualResetEvent = new ManualResetEvent(false);
                    handles.Add(manualResetEvent);
                    new Thread(() =>
                    {
                        WalkKey(snapshot, view, RegistryKey.OpenBaseKey(hive, view));
                        manualResetEvent.Set();
                    }).Start();
                }
            }
            ManualResetEvent.WaitAll(handles.ToArray());
        }

        private static void WalkKey(BigList<RegistryPath> snapshot, RegistryView view, RegistryKey key)
        {
            RegistryPath path = new RegistryPath { View = view, Path = key.Name, HashValue = (view.GetHashCode() ^ key.Name.GetHashCode()).GetHashCode() };
            snapshot.Add(path);
            string[] valueNames = null;
            try
            {
                valueNames = key.GetValueNames();
            }
            catch { }
            if (valueNames != null)
            {
                foreach (string valueName in valueNames)
                {
                    RegistryValueKind valueKind = RegistryValueKind.Unknown;
                    try
                    {
                        valueKind = key.GetValueKind(valueName);
                    }
                    catch { }
                    object value = key.GetValue(valueName);
                    RegistryPath pathForValue = new RegistryPath { View = view, Path = key.Name, ValueKind = valueKind, ValueName = valueName, Value = value, HashValue = (view.GetHashCode() ^ key.Name.GetHashCode() ^ valueKind.GetHashCode() ^ valueName.GetHashCode()).GetHashCode() };
                    snapshot.Add(pathForValue);
                }
            }
            string[] subKeyNames = null;
            try
            {
                subKeyNames = key.GetSubKeyNames();
            }
            catch { }
            if (subKeyNames != null)
            {
                foreach (string subKeyName in subKeyNames)
                {
                    try
                    {
                        WalkKey(snapshot, view, key.OpenSubKey(subKeyName));
                    }
                    catch { }
                }
            }
        }
    }
}

However, CLR still triggers a System.OutOfMemory exception. It is not thrown anywhere, but I see program execution stop entirely at around 2 GB of RAM, and when I freeze my code in Visual Studio, it shows that an out of memory exception was thrown whenever I try to view the state of variables within any thread of the application. It never happens on the first call to WalkTheRegistryAndPopulateTheSnapshot(snapshotOne);, but when the second call to WalkTheRegistryAndPopulateTheSnapshot(snapshotTwo); proceeds, it ends up stopping program execution at around 2 GB of overall RAM usage in my collections. The entire code is posted, so if you have a beefy registry you can probably see it get generated on an x86 console application. Is there something that I failed to grasp here, or is this pattern not a valid means to get around the 2 GB collection size limit that the other question on Stack seems to play up to?

That will avoid the 2 GB limitation on a single object, but you're still faced with the fact that a 32-bit application is limited to 2 GB of total memory. Or perhaps 3 GB if it's large address aware. — Jim Mischel, Nov 16 '14 at 05:00
@JimMischel Oh, I did not know that overall a 32-bit application is limited to 2 GB. But then why do people try to surpass this limit for single objects? Isn't that extremely redundant since the application itself has this limit imposed on it? — Alexandru, Nov 16 '14 at 05:02
@Alexandru: Well, they're not targeting 32-bit architectures :-) Plus, as Jim explains, it avoids requiring a large contiguous section of free memory. — Cameron, Nov 16 '14 at 05:21
@Cameron Well if they're targeting 64-bit, it seems totally unreasonable not to use .NET 4.5+ for this application with the .config change. — Alexandru, Nov 16 '14 at 14:01

Jim Mischel · Accepted Answer · 2014-11-16T05:32:08.100

3

I'm going to expand on my comment. If you're writing a 32-bit app, you have some serious memory constraints when you're working with large amounts of data.

The most important thing to remember is that the 32-bit application is limited to an absolute maximum of 2^32 bytes (4 GB) of memory. In practice, it's usually 2 GB, or perhaps 3 GB if you have that much memory and the application is large address aware.

There's also the .NET imposed 2 GB limit, which limits the size of any single object to no more than 2 GB. It's rare that you'll encounter this limit in a 32-bit program, simply because, even on a machine that has more than 2 GB of memory, it's unlikely that there will be a contiguous chunk of memory that's 2 GB in size.

The 2 GB limit also exists in 64 bit versions of .NET, unless you're running .NET 4.5 and use the app.config setting that enables large objects.

As for why something like BigList exists in 32-bit versions, it's a way to get around requiring a contiguous block of memory. For example, a List<int> with 250 million items requires a gigabyte: a contiguous block of memory that's 1 GB in size. But if you use the BigList trick (as you did in your code), then you need 250 individual blocks of memory that are 4 MB in size. It's a whole lot more likely that you'll have 250 blocks of 4 MB than you will a single 1 GB block.

edited Nov 16 '14 at 05:32

answered Nov 16 '14 at 05:09

Jim Mischel

131,090
20
188
351

"if you have that much memory" -- while the rest of your comments are mostly accurate (you can also enable large objects in the program .config), it's not true that the amount of installed RAM on a machine has anything to do with how much memory an application can use. An application's virtual address space can exceed the physical RAM available; the memory managed simply swaps some of the application's data out to the disk. – Peter Duniho Nov 16 '14 at 05:27
Anybody remember the `/3gb` windows switch in `boot.ini`? – John Alexiou Nov 16 '14 at 05:33
@PeterDuniho: Thanks for the correction on the large objects. I meant app.config. Don't know why I said registry. As for the memory thing ... you're right: I should have been specific and said virtual address space rather than imply that the limit is based on RAM. Whatever the case, a 32-bit app is limited to a maximum of 4 GB virtual address space. Usually 2 GB or 3 GB, though, under Windows. – Jim Mischel Nov 16 '14 at 05:34
@JimMischel When dealing with a lot of big and bulk data and a requirement to support 32-bit applications, is it best to just persist this data to the disk using mechanisms like relational database models, SQLite, etc., or do you happen to know of any mechanisms to get more memory? I guess I could create another process and have them talk to each other too. – Alexandru Nov 16 '14 at 14:05
2

@Alexandru: The answer to your question is, "it depends." I don't know enough about your application to make a recommendation. Note, however, that on a 32-bit system (i.e. a box running a 32 bit version of Windows), you will be limited to 4 GB of virtual address space, total. That is, if you have two processes running, between them you won't be able to access more than 4 GB. (Because the OS is limited to 4GB address space.) – Jim Mischel Nov 16 '14 at 14:09
Ah, good point. Disk it is then. Unfortunately I'll never be able to query more than that amount of data into memory, which means I would always need to query in chunks and do work on chunks of data. – Alexandru Nov 16 '14 at 14:18

Getting around the 2 GB collection limit in .NET

1 Answers1

Linked