0

I am currently writing an winforms c# application that will allow users to cleanse text / log files. At present the app is working, but if the file is massibe in size, i.e. 10MB it is taking an age!

The first cleanse it does is for users Windows Auth, i.e. who was logged in at the time. I have a textfile of all users in our organisation, roughly 10,000.

I load this into a

List<string> loggedUsers = new List<string>();

string[] userList = System.IO.File.ReadAllLines(@"C:\temp\logcleaner\users.txt");
            foreach (string line in userList)
            {
                loggedUsers .Add(line.ToString());
            }

Next i take a textfile and show it in a RichTextBox (rtbOrgFile), allowing the user to see see what information is currently there. The user then clicks a button which does the following:

foreach (var item in loggedUsers)
            {
                if (rtbOrgFile.Text.Contains(item.ToString()))
                {
                    if(foundUsers.Items.Contains(item.ToString()))
                    {
                        // already in list
                    }
                    else
                    {
                        foundUsers.Items.Add(item.ToString());
                    }

                }
            }

My question is, is this the most efficient way? Or is there are far batter way to go about this. The code is working fine, but the as you start to get into big files it is incredibly slow.

Gareth
  • 512
  • 1
  • 4
  • 15
  • Go about what? Please take a step back from the code and explain what you're trying to do, then we can look at the code to see if this is fitting, right now I don't even understand what you're trying to do here. What does "cleanse" mean? What is "loggedUsers"? What about "foundUsers"? – Lasse V. Karlsen Sep 22 '16 at 07:16
  • Sorry 'cleanse' - to remove anything / to clean the file. loggedUsers is the list of all users within our Org. 'FoundUsers' is a listbox that gets populated if a 'LoggedUser' is found within the textfile, i.e. loggedUser would contain JBLOGGS, if JBLOGGS is found within the textfile, it gets added to foundUsers. The code then loops for all users listed in loggedUsers. A button then comes later to 'Clean' which replaces any foundUser with DATAREMOVED in the file outputted by the system. – Gareth Sep 22 '16 at 07:21

1 Answers1

1

First, I would advise the following for loading your List:

List<string> loggeedUsers = System.IO.File.ReadAllLines("[...]users.txt").ToList();

You didn't specify how large the text file that you load into the RichTextBox is, but I assume it is quite large, since it takes so long.

Found in this answer, it suggests the Lucene.NET search engine, but it also provides a simple way to multi-thread the search without that engine, making it faster. I would translate the example to:

var foundUsers = loggeedUsers.AsParallel().Where(user => rtbOrgFile.Contains(user)).ToList();

This way, it checks for multiple logged users at once.

You need at least .NET 4.0 for Parallel LINQ (which this example uses), as far as I know. If you don't have access to .NET 4.0, you could try to manually create one or two Threads and let each one handle an equal part of loggedUsers to check. They would each make a separate foundUsers list and then report it back to you, where you would merge them to a single list using List<T>.AddRange(anotherList).

Community
  • 1
  • 1
Manuel Hoffmann
  • 539
  • 1
  • 7
  • 23
  • great stuff. Will have a look. I've changed the loop from a foreach to a for(int=i...) and its returning results in about 1min 30sec, in comparison to 10+ minutes for the foreach. Appreicate it. – Gareth Sep 22 '16 at 07:50