1

I have written the below linq statement. But it takes huge time to process since there are so many lines. My cpu has 8 cores but only using 1 core due to running single thread.

So i wonder by any chance can this final stament run in multi threading ?

        List<string> lstAllLines = File.ReadAllLines("AllLines.txt").ToList();
        List<string> lstBannedWords = File.ReadAllLines("allBaddWords.txt").
Select(s => s.ToLowerInvariant()).
Distinct().ToList();

I am asking the one below. Can that line work multi threading ?

        List<string> lstFoundBannedWords = lstBannedWords.Where(s => lstAllLines.
SelectMany(ls => ls.ToLowerInvariant().Split(' ')).
Contains(s)).
        Distinct().ToList();

C# 5 , netframework 4.5

Furkan Gözükara
  • 22,964
  • 77
  • 205
  • 342

2 Answers2

5

The following snippet can perform that operation using the Parallel Tasks Library's Parallel.ForEach method. The snippet below takes each line in the 'all-lines' file you have, splits it on spaces, and then searches each line for banned words. The Parallel-ForEach should use all available core's on your machine's processor. Hope this helps.

System.Threading.Tasks.Parallel.ForEach(
    lstAllLines,
    line =>
    {
        var wordsInLine = line.ToLowerInvariant().Split(' ');
        var bannedWords = lstBannedWords.All(bannedWord => wordsInLine.Contains(bannedWord));
        // TODO: Add the banned word(s) in the line to a master list of banned words found.
    });
ajawad987
  • 4,439
  • 2
  • 28
  • 45
1

There are rooms for performance improvements before resorting to AsParallel

HashSet<string> lstAllLines = new HashSet<string>(
                                File.ReadAllLines("AllLines.txt")
                                    .SelectMany(ls => ls.ToLowerInvariant().Split(' ')));

List<string> lstBannedWords = File.ReadAllLines("allBaddWords.txt")
                                    .Select(s => s.ToLowerInvariant())
                                    .Distinct().ToList();

List<string> lstFoundBannedWords = lstBannedWords.Where(s => lstAllLines.Contains(s))
                                    .Distinct().ToList();

Since access to HasSet is O(1) and lstBannedWords is the shorter list, You may even not need any parallelism (TotalSearchTime=lstBannedWords.Count*O(1)). Lastly, you always have the option AsParallel

I4V
  • 34,891
  • 6
  • 67
  • 79
  • 1
    It wasn't me but you might want to rename the `lstAllLines` variable to something like `hashAllWords` to make the code easier to understand. – Dirk May 31 '13 at 14:53
  • @Dirk I just wanted to preserve the variable names for the OP. Afterall, it is just Refactor/Rename when using VS. – I4V May 31 '13 at 15:01
  • actually this is not exactly doing what i am doing :) also i did not down vote. i am checking based on word by word level, you are checking literal level. – Furkan Gözükara May 31 '13 at 21:16
  • @MonsterMMORPG Have you tested it before commenting? – I4V May 31 '13 at 21:50