-2

I have to iterate over 1 Million items and my next action depends on successful iteration over those 1 Million items. As of now I am using While loop of C#, which is taking around 5-6 minutes to complete the iterations. Is there any way to speed up this processing? As later on I might have around 5-6 Million items to iterate.

        System.IO.StreamReader file =
           new System.IO.StreamReader("data.csv");
        System.IO.StreamWriter jsonFile =
           new System.IO.StreamWriter("jsonData.csv");
        BBP obj;
        var dataList = new Dictionary<string, dynamic>();
        while ((line = file.ReadLine()) != null)
        {
            if (counter == 0)
            {
                columns = line.Split(',');
            }
            else
            {
                data = line.Split(',');

                obj = new BBP();
                obj.BBP_CR_PART_NO = data[0];
                obj.BBP_RO_NO = data[1];
                obj.BBP_BPR_TPN = data[2];
                dataList.Add("item_" + counter, obj);
                Console.WriteLine(counter);
            }
            counter++;
        }
Mohit Vaidh
  • 53
  • 1
  • 6
  • 1
    This question is far too vague and speculative. Loops themselves are not inherently slow; 1 million iterations of doing nothing takes _much_ less than a second on modern hardware. But you didn't post the code that is slow, so there's no way for anyone to comment. Please see http://stackoverflow.com/help/mcve – Peter Duniho Nov 03 '14 at 06:58
  • Yup post code or we can't help, how are you "iterating using while"? Just use a foreach if iterating a collection, i can assure you the slow part is NOT the iteration but whatever you're doing in there – Ronan Thibaudau Nov 03 '14 at 06:58
  • Use multiple threads if you can, use `break` and `continue` when appropriate? – Rufus L Nov 03 '14 at 06:59
  • 1
    One thing you should probably do, as soon as you revise your insanely vague question, is try to sort the array you are iterating. If it's an array that is. – Alex Nov 03 '14 at 07:00
  • Added code sample. Please advise now. Thanks!!! – Mohit Vaidh Nov 03 '14 at 07:07
  • You're doing file I/O, so your going to see a performance hit no matter what. File I/O is always expensive. – Tim Nov 03 '14 at 07:36
  • @Tim, I agree, but loading the file into SQL would be an overhead. If I will load it into memory somehow and then use it? But my file size is 500MB, again that will be expensive. Any workaround you think of? – Mohit Vaidh Nov 03 '14 at 07:50
  • @MohitVaidh - Outside of what Rufus suggested in the answer below, there's not much you can do. Loading it into memory first won't give you much of a performance increase, if any, and you may run into memory issues if the file gets very large. 5-6 minutes for a 500MB file seems about right - I have a program that processes a file encoded in EBCDIC, and it takes about 1 minute per 100MB. – Tim Nov 03 '14 at 08:54

1 Answers1

1

You could speed things up if you:

  1. Read the first line outside of the loop, then remove the if (counter == 0) check. This is getting processed for every line and only has value once.
  2. Change dataList to a List<BBP>. This way you don't need to specify a key when adding items (the string concatenation has a small cost), and you already have an index that you can reference later (if that works for you).
  3. Remove the counter increment (and at this point, the counter altogether).
  4. Remove the Console output.

Also, consider breaking up your file into multiple files, and then you can implement a multi-threaded solution, like the answer to this question: Read and process files in parallel C#

If you can't split up your file, you might consider reading chunks of lines (maybe 1000 at a time) into a List<string>, then send that list to a Task for processing. This way, you read 1000 lines, send them to a Task, while the task is processing you read the next 1000 lines, etc. Check out this example: http://msdn.microsoft.com/en-us/library/jj155756.aspx

Finally, consider storing your data in a database instead of .csv files.

Community
  • 1
  • 1
Rufus L
  • 36,127
  • 5
  • 30
  • 43
  • Rufus, actually this is what the requirement was, to read from a file and create objects out of it. I will incorporate your inputs and update. But generally, is there any way to just speed up this code loop only? – Mohit Vaidh Nov 03 '14 at 07:42
  • @MohitVaidh Yeah, my first four comments, plus the redesign option of using Tasks..? – Rufus L Nov 03 '14 at 07:43
  • Sure Rufus, regarding Tasks, I am not that good in it. Can you please give an idea what to move into Task part? – Mohit Vaidh Nov 03 '14 at 07:52
  • Try out the example from the link, and try to apply it to your situation. The best way to get good at something is to try it!! – Rufus L Nov 03 '14 at 07:55