2

If I search online for "c++ list" I get a good summary of how a list in c++ works, and its benefits / drawbacks, and such. However I can't find the same for C#. As far as I'm aware the name list doesn't say anything on how it is implemented and it can vary greatly what a list actually is in a language.

I need to load in a lot of files with lots of data into preferably one array for fast random access of all the data. However its too much data and c# can't find a continuous block of memory big enough. So I was going to create an abstraction that took many arrays and would act as one. It would have an indexer property that would see about accessing the correct array.

However then I thought, isn't that how a list actually works in c#? All I know about lists in c#, or at least think, is that they don't work like a linked list where there is no way to access a random element but from the previous element or perhaps from the element after it.

Can I get some detail on this matter?

Dimension
  • 665
  • 1
  • 8
  • 9
  • 1
    might be able the info your after here http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx – Daniel Apr 02 '13 at 15:59
  • 1
    Did you read [this](http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx)? – PiousVenom Apr 02 '13 at 15:59
  • 1
    For fast random access you want a Hash table, not a list or array, like `Dictionary` – asawyer Apr 02 '13 at 16:00
  • Make sure your definition of list is managed or unmanaged list. they are different issues and should not be mixed together. If you use Managed C++, the C++ list is identical to C# ones, as they both are converted to IMSL, which is "identical" – David Apr 02 '13 at 16:00
  • Download a decompiler like Telerik's JustDecompile or JertBrain's DotPeek and take a look at the implementation yourself! – Paul Sasik Apr 02 '13 at 16:01
  • @asawyer what? - Maybe I worded it incorrectly. All I want is to access say array[5000000] fast. That's not what a hash table would help me do. – Dimension Apr 02 '13 at 16:02
  • I'm going to guess your approach is incorrect. Say you load all the data, and then do not necessarily use all of it...waste of memory, perhaps? - Personally I'd have a list of handles, references etc, to the data, and then load as required using BinaryReader, StreamReader etc. As for the internals of C#'s list implementation, if you're really tech savvy...you could just ILSpy the class yourself and find out what's going on. – Matthew Layton Apr 02 '13 at 16:03
  • @Dimension if you only need an array, use an array: `int[] ints = new int[50];` that int array can store 50 integers. Here's a tutorial on arrays: http://msdn.microsoft.com/en-us/library/aa288453(v=vs.71).aspx –  Apr 02 '13 at 16:03
  • @asawyer, Dictionary is a strongly typed hash table, is it not!? – Matthew Layton Apr 02 '13 at 16:04
  • @series0ne - No I do need all of it. I need 2 arrays, one by date, one with values. The one by date gets binary searched, and contains at least 50 million elements. After finding a location, I'll use perhaps 100 thousand of those elements. Per one binary search. – Dimension Apr 02 '13 at 16:07
  • @MarkusMeskanen - Like I said, the array is too big to use as one array. Of course that means I'm trying to open a larger array than 50 elements. Perhaps 50 million instead. – Dimension Apr 02 '13 at 16:08
  • @series0ne You need to to separate the general CS concept of a hash table from the `HashTable` class in C#. asawyer was refering to hash table in the general CS sense of a hash based data structure, of which `Dictionary` is one implementation of it in C#. The `HashTable` class in C# really shouldn't be used anymore. – Servy Apr 02 '13 at 16:09
  • @Servy, I'd be interested to know how they differ when you get down to the nitty gritty of it. But my understanding is that Dictionary is the preferred mechanism in C# as this can be strongly typed. Does dictionary actually hash any stored entries, or is it simply a list of key/value pairs? – Matthew Layton Apr 03 '13 at 08:16
  • @series0ne A `Dictionary` is implemented under the hood using the exact same concept as `HashTable`. `Dictionary` only differs in that it is generic. If it was just a list of pairs then it wouldn't really be worth using at all as it would have *terrible* performance. – Servy Apr 03 '13 at 13:51

2 Answers2

8

To answer your initial query - 'List' is backed by an array in c#

From MSDN:

http://msdn.microsoft.com/en-us/library/ms379570(v=vs.80).aspx#datastructures20_1_topic5

List is "a Homogeneous, Self-Redimensioning Array"

The article is very nice and maybe what you are looking for.

Imp: The List in c# is backed by array so the theoretical limit of size would be the limit of the array backing it. Also when using a list, if you know you are going to grow it to a certain size for sure, then it would be a good idea to set the initial capacity large enough for optimizing the performance.

But I have a feeling 'list' is not the solution you are looking for here.

Suggestion: Maybe if you have a limit on file size, you could perhaps use a Dictionary that keyed the files to their contents and use canonical file path as the key with contents stored in a list. This will perhaps be of use to you.

MickJ
  • 2,127
  • 3
  • 20
  • 34
2

I think this is a really good definition, from C# 5.0 in a nutshell book

Internally, List and ArrayList work by maintaining an internal array of objects, replaced with a larger array upon reaching capacity. Appending elements is efficient (since there is usually a free slot at the end), but inserting elements can be slow (since all elements after the insertion point have to be shifted to make a free slot). As with arrays, searching is efficient if the BinarySearch method is used on a list that has been sorted, but is otherwise inefficient because each item must be individually checked.

pollirrata
  • 5,188
  • 2
  • 32
  • 50