0

I have an Excel sheet with 5000 rows * 30 columns, and I want to read them into a C# program and then access the values regularly by looking up both the row and column headers. the program will be run on a normal laptop (16GB ram). Should I use a Dictionary of Dictionaries to store this data? If so, is it more advisable to create a dictionary of 5000 smaller dictionaries each with 30 key-value pairs (ie, row headers will be the key of the "outer" dictionary, and column header will be the key of the "inner" dictionary) or 30 dictionaries each with 5000 key-value pairs? or should I create a 2D array, and store the row headers & row index pair and column headers & column index pair in 2 smaller dictionaries? or do I need to be worried about the memory/performance issues at all for this amount of data?

Thank you.

  • 4
    You should create a `Model` class, where Properties would be your 30 columns, and its values would be values from 5000 rows. So you would have only one `List` with 5000 Model items in it. – Auditive Oct 13 '21 at 14:44
  • 3
    5000 rows is _nothing_ for a modern computer, even if the rows are large. – PMF Oct 13 '21 at 14:55
  • 1
    If you are modeling the data in an Excel spreadsheet (that doesn't include merged cells or other _non-rectangular_ features) and you will be accessing them via row and column numbers, then you have the perfect use-case for a 2D array. Then again, this question is opinion-based and will likely close soon (Stackoverflow is allergic to opinions) – Flydog57 Oct 13 '21 at 15:10
  • 1
    A dictionary is a binary hash with the number of items being Log(N). If you boke into 30 dictionaries the lookup would be 30/2*(log(N/30) and you have to compare with Log(N) to see the differences. The lookup in a straight list is N/2 which is the average amount of test before finding the item. – jdweng Oct 13 '21 at 15:12

3 Answers3

2

Example to explain comment.

You have some table of N columns and N rows:

enter image description here

You can create a class which would represent one object (one entity):

public class Person
{
}

and fill it with Properties, which would be similar for each entity (your 30 columns):

public class Person
{ 
    public string Name { get; set;}
    public string Surname { get; set;}
    public int Age { get; set;}
    public string City { get; set;}
    public string Gender { get; set;}
    // and other of 30 columns
}

So, each row in table represents some Person (in that example). 5000 rows = we would have 5000 Persons. As they are similar entities - we can store them as collection or Persons. We can use (for simple example):

  • Person[] - array of Persons;
  • List<Person> - list of Persons;

Let's imagine, how to read data from table and create a List of Persons from it.

public void ReadTable(Table myTable) // We have Table with Rows and Columns
{ 
    // Initializing our collection of Persons
    List<Person> persons = new List<Person>;

    // We need to read all 5000 rows, so iterating them
    for (int i = 1; i <= myTable.Rows.Count; i++) // myTable.Rows.Count = 5000
    {
        // Creating a Person
        Person person = new Person(); 
        
        // Reading each cell value, accessing to it through RowIndex & ColumnIndex 

        // Row 1 Column 1 is Name 
        string name = myTable.Rows[i].Columns[1].Value.ToString(); // John
        // Row 1 Column 2 is Surname
        string surname = myTable.Rows[i].Columns[2].Value.ToString(); // Wick
        // Row 1 Column 3 is Age
        int age = int.Parse(myTable.Rows[i].Columns[3].Value); // 55  
        // Row 1 Column 4 is City
        string city = myTable.Rows[i].Columns[4].Value.ToString(); // New York 
        // Row 1 Column 5 is Gender
        string gender = myTable.Rows[i].Columns[5].Value.ToString(); // Male

        //Received from table values we add to a Person:
        person.Name = name;
        person.Surname = surname;
        person.Age = age;
        person.City = city;
        person.Gender = gender;
   
        // And finally adding Person to a collection (list) of Persons:
        persons.Add(person);

        // And same would happen with other 5000 rows                    
   }
}

So after reading you will have persons list with 5000 items. You can manipulate them, edit, add new and remove existing - whatever.

Auditive
  • 1,607
  • 1
  • 7
  • 13
1

According to this the memory overhead of an element in a dictionary is about 20 bytes. So 5000 * 30 * 20 = 3000000 bytes, or 3Mb, plus a few Kb for the dictionary objects themselves.

Since this is insignificant for a modern computer it would probably not matter from a memory standpoint. It would probably not matter much from a performance standpoint either.

So if you chose a dictionary of dictionaries, jagged arrays, 2D-array, or explicit row-objects should be determined more from the easiest alternative to use.

If all cells have the same type a 2D-array might be the simplest option. If the types are different it would probably be most useful to create a Model or Row class with properties for each column, this would help provide a higher level abstraction than simply rows/columns.

JonasH
  • 28,608
  • 2
  • 10
  • 23
1

As answered by @JonasH, 5000 rows can be handled by any modern system, and defining Model for the structure is good solution to go ahead. On the other hand, you can also load your excel data into a DataTable object. DataTable is easy to use which allows to pool data into it with consistent format, embed with other properties that allows to do sorting, search etc. If the data load is going to increase in future, go ahead by creating a model.