2

I'm working on huge amounts of data objects and I need to be able to convert them to text files based off of a csv header map. The header map is basically a list of fields that need to be printed out in that particular order e.g. if the header map is: ID, Name, Location etc, then those three fields of all objects need to be written to a csv file.

The problem is that the list of objects can be in millions and the header map can have hundreds of fieldnames in there - which will always be valid fieldnames in objects. Currently, the code I'm working with iterates over the entire list of objects and uses a switch statement over a map of all csv values to print out required fields. The code looks like this:

foreach (Object o in Objects)
{
    foreach (String FieldName in CsvFieldsList)
    {
        switch (FieldName)
        {
              case "ID":
                 //Do something for this specific ID:
                 outString.Append(o.ID);
                 break;
                 ....
                 ....
        }

    }

}

AFAIK, C# will use a dictionary for string switch statements so those should be fast, however, this code is taking hours for some of our large datasets and I'm wondering if it's possible to improve this design.

tunafish24
  • 2,288
  • 6
  • 28
  • 47
  • Maybe [this](http://stackoverflow.com/questions/16191591/what-consumes-less-resources-and-is-faster-file-appendtext-or-file-writealltext) post will help. Explaining ways to boost the performance. – Measurity May 06 '14 at 06:52
  • What do you do with the o object in the foreach loop? – Kunukn May 06 '14 at 07:17
  • Object has different types of properties i.e. string, int, Boolean etc and we convert them to string notation. Note that some of those properties require additional processing. I've also updated code to give you an idea of what a typical (not all) switch cases look like. – tunafish24 May 06 '14 at 08:02
  • You have a running time of `O(m * n)` where m is Objects.Count and n is CsvFieldsList.Count. If you could put the Objects and CsvFieldsList in a dictionary and retrieve data by lookup, then you should be able to get a running time of `O(m + n)` – Kunukn May 06 '14 at 09:40
  • My understanding is that C# switch statement uses a dictionary for large number of string cases. I'm not sure if manually using a dictionary will have any noticeable effect. – tunafish24 May 06 '14 at 19:02

1 Answers1

0

could you try some parellel approach. eg, split Objects into some small Objects, then handle every small Objects in a separate thread or process. every thread or process write a single csv file, after all thread or process finished, combine these csv files together.

Xing Fei
  • 287
  • 1
  • 6
  • We've already tried using multi-threading and it has helped somewhat. I'm curious if there is a better algorithm/approach for this problem. – tunafish24 May 06 '14 at 06:41