1

I have this. It is an application for generating bank Accounts

static void Main(string[] args)
    {

        string path = @"G:\BankNumbers";
        var bans = BankAcoutNumbers.BANS;
        const int MAX_FILES = 80;
        const int BANS_PER_FILE = 81818182/80;
        int bansCounter = 0;
        var part = new List<int>();
        var maxNumberOfFiles = 10;
        Stopwatch timer = new Stopwatch();
        var fileCounter = 0;


        if (!Directory.Exists(path))
        {
            DirectoryInfo di = Directory.CreateDirectory(path);
        }

        try
        {
            while (fileCounter <= maxNumberOfFiles)
            {
                timer.Start();
                foreach (var bank in BankAcoutNumbers.BANS)
                {
                    part.Add(bank);
                    if (++bansCounter >= BANS_PER_FILE)
                    {
                        string fileName = string.Format("{0}-{1}", part[0], part[part.Count - 1]);
                        string outputToFile = "";// Otherwise you dont see the lines in the file. Just single line!!

                        Console.WriteLine("NR{0}", fileName);
                        string subString = System.IO.Path.Combine(path, "BankNumbers");//Needed to add, because otherwise the files will not stored in the correct folder!!
                        fileName =  subString + fileName;

                        foreach (var partBan in part)
                        {

                            Console.WriteLine(partBan);
                            outputToFile += partBan + Environment.NewLine;//Writing the lines to the file

                        }
                        System.IO.File.WriteAllText(fileName, outputToFile);//Writes to file system.
                        part.Clear();
                        bansCounter = 0;
                        //System.IO.File.WriteAllText(fileName, part.ToString());

                        if (++fileCounter >= MAX_FILES)
                            break;
                    }
                }
            }

            timer.Stop();
            Console.WriteLine(timer.Elapsed.Seconds);
        }
        catch (Exception)
        {

            throw;
        }

        System.Console.WriteLine("Press any key to exit.");
        System.Console.ReadKey();
    }

But this generates 81 million bank account records seperated over 80 files. But can I speed up the process with threading?

leppie
  • 115,091
  • 17
  • 196
  • 297
savantKing
  • 89
  • 1
  • 11
  • 1
    Probably not, no. Either way, you're welcome to try it and find out for yourself. That's the best way to get a conclusive answer. – Servy Jan 21 '15 at 15:24
  • 1
    Did you try http://stackoverflow.com/questions/16191591/what-consumes-less-resources-and-is-faster-file-appendtext-or-file-writealltext? – L-Four Jan 21 '15 at 15:38
  • Use a `StringBuilder` instead of string concatenation in a loop. Or simply `File.WriteAllLines(fileName, part)` eliminate the loop. – CodesInChaos Jan 21 '15 at 16:27

2 Answers2

1

You're talking about speeding up a process whose bottleneck is overwhelmingly likely the file write speed. You can't really effectively parallelize writing to a single disk.

You may see slight increases in speed if you spawn a worker thread responsible for just fileIO. In other words, create a buffer, have your main thread dump contents into it while the other thread writes it to disk. It's the classic producer/consumer dynamic. I wouldn't expect serious speed gains, however.

Also keep in mind that writing to the console will slow you down, but you can keep that in the main thread and you'll probably be fine. Just make sure you put a limit on the buffer size and have the producer thread hang back when the buffer is full.

Edit: Also have a look at the link L-Three provided, using a BufferedStream would be an improvement (and probably render a consumer thread unnecessary)

TASagent
  • 244
  • 1
  • 7
  • Thank you for your comment. Can you give an example. Thank you. But the keyword Async - is not something to speed it up? – savantKing Jan 21 '15 at 15:34
  • The issue here is going to be writing the file to disk. You could use multiple threads to generate the data you want to store, but you're *already* going to be spawning the data faster than you can write it to disk. If you speed that up, the queue will just fill up faster. My expertise isn't C#, but I suspect it's already a buffered output, meaning you should probably expect minimal gains from any multi-threading. – TASagent Jan 21 '15 at 15:38
  • @Nielsfischerein No, it's not. It's a keyword that makes writing asynchronous code easier. That's all. – Servy Jan 21 '15 at 15:44
  • But you can do this with threading, that for every file you make a thread? But how? – savantKing Jan 22 '15 at 10:10
0

Your process can be divided into two steps:

  1. Generate an account
  2. Save the account in the file

First step can be done in parallel as there is no dependency between accounts. That is wile creating an account number xyz you don't have to rely on data from the account xyz - 1 (as it may not yet be created).

The problematic bit is writing the data into file. You don't want several threads trying to access and write to the same file. And adding locks will likely make your code a nightmare to maintain. Other issue is that it's the writing to the file that slows the whole process down.

At the moment, in your code creating account and writing to the file happens in one process.

What you can try is to separate these processes. So First you create all the accounts and keep them in some collection. Here multi-threading can be used safely. Only when all the accounts are created you save them.

Improving the saving process will take bit more work. You will have to divide all the accounts into 8 separate collections. For each collection you create a separate file. Then you can take first collection, first file, and create a thread that will write the data to the file. The same for second collection and second file. And so on. These 8 processes can run in parallel and you do not have to worry that more than one thread will try to access same file.

Below some pseudo-code to illustrate the idea:

    public void CreateAndSaveAccounts()
    {
        List<Account> accounts = this.CreateAccounts();

        // Divide the accounts into separate batches
        // Of course the process can (and shoudl) be automated.
        List<List<Account>> accountsInSeparateBatches =
            new List<List<Account>>
            {
                accounts.GetRange(0, 10000000),             // Fist batch of 10 million
                accounts.GetRange(10000000, 10000000),      // Second batch of 10 million
                accounts.GetRange(20000000, 10000000)       // Third batch of 10 million
                // ...
            };

        // Save accounts in parallel
        Parallel.For(0, accountsInSeparateBatches.Count,
            i =>
                {
                    string filePath = string.Format(@"C:\file{0}", i);
                    this.SaveAccounts(accountsInSeparateBatches[i], filePath);
                }
            );
    }

    public List<Account> CreateAccounts()
    {
        // Create accounts here
        // and return them as a collection.
        // Use parallel processing wherever possible
    }

    public void SaveAccounts(List<Account> accounts, string filePath)
    {
        // Save accounts to file
        // The method creates a thread to do the work.
    }
PiotrWolkowski
  • 8,408
  • 6
  • 48
  • 68