2

I have a hashset of strings that contain lognames called pidLogsLines. From this hashset I want to filter out logs that only contain strings from another hashset called pidList and put these logs inside a new hashset called filesWithPid. I am doing this with two loops:

var filesWithPid = new HashSet<string>(StringComparer.OrdinalIgnoreCase);
foreach(var logname in pidLogsLines)
{
    foreach(var pid in pidList)
    {
        if (logname.Contains(pid))
        {
            filesWithPid.Add(logname);
        }
    }

}

Is this the optimal way to do it? I am new to C# so I am not aware of any elaborate faster ways to do it

devn00b1090
  • 97
  • 1
  • 8
  • Honestly, a HashSet doesn't provide any benefit here. – ProgrammingLlama Mar 31 '21 at 01:08
  • @Llama This logic is actually part of a bigger application and the application uses these hashsets. The hashsets are used for filtering up to this logic so yeah thats why I am using them – devn00b1090 Mar 31 '21 at 01:12
  • How many elements are each HashSet and are they ordered? Depending on size and ordering, you may want to opt for a binary search. Otherwise, this seems to be standard. – Hayden Mar 31 '21 at 01:15
  • The hashset pidLogsLines can have 50000 items (lognames) but the second hashset pidList will only have at most 200 items (PIDs). No they are not ordered @Hayden – devn00b1090 Mar 31 '21 at 01:19
  • 3
    I think it's a problem that can be parallelized. You could use multiple threads to scan through multiple hash sets in parallel. While visiting each item, add the string to a shared concurrent (thread safe) dictionary. At the end, you'd have a data structure which only contains the shared items you're looking for. I think the code you have above, though, is the *simplest* way to do it, so be sure you're not prematurely optimizing things that don't need to be that complex. – Mike Christensen Mar 31 '21 at 01:19
  • I agree with Mike. If this meets your performance requirements then I'd say leave it as is. If you needed more performance and can spare more memory usage, I'd say converting them to lists would be more efficient to loop over. Then on top of that you can also do multithreading if even more performance is needed. I can provide a simple example on this is needed. – Lolop Mar 31 '21 at 01:26
  • @MikeChristensen yeah I agree. I looked at the "LINQ" solution linked and syntactically due to my being new to LINQ, I have no idea what I am looking at. So I will stick to the standard way I did it. Thanks guys. But I need to learn LINQ – devn00b1090 Mar 31 '21 at 01:41

0 Answers0