Lists of related items

Question

I have a text file with formatting similar to the following:

#
example1.com;example2.com;example3.com
example4.net;example6.org
example7.uk;example8.io;ab123example4.net
#

Each line defines domains owned by a single company. Each line can have 2 or more domains.

Unfortunately I cannot modify the formatting of the file.

I am not overly familiar with c# (I generally work with bash/sh on Linux/Unix where I would likely default to using grep) and am trying to extend some existing c# software to add a check whether two domains are owned by the same company.

At present I'm reading the file as follows:

private List<string> _CompanyOwnedDomains;

private String CompanyOwnedDomainsFileName = Environment.GetEnvironmentVariable(
                "DomainChecker",
                EnvironmentVariableTarget.Machine) + 
            @"Path\To\CompanyOwnedDomains.config";

// Various error checking happens here

_CompanyOwnedDomains = File
                        .ReadAllLines(CompanyOwnedDomainsFileName)
                        .Where(line => !String.IsNullOrEmpty(line))
                        .Where(line => !line.StartsWith("#"))
                        .Select(line => line.ToLower())
                        .ToList();

When I get to the check, so far I am a bit stuck on how to interact with the above.

For arguments sake, lets say I have two variables, DomainA and DomainB. I would like to check if both domains are owned by the same company.

I could do something like the following, however this seems quite inefficient:

var Match = _CompanyOwnedDomains
    .FirstOfDefault(DomainsList => DomainsList.Contains(DomainA.ToString());

if(Match != null) && Match.Contains(DomainB.ToString())
{
    // Do stuff
}

Is there a way to check if both values exist within the same list item?
Would the Contains method return ab123example4.net for a query of "example4.net" or similar?
Would I be better using a different variable type such as a dictionary?

Camilo Terevinto · Accepted Answer · 2018-04-25T22:01:29.113

Yes, just add the condition to the filter:

var match = _CompanyOwnedDomains
    .FirstOrDefault(domains => domains.Contains(domainA.ToString()
                            && domains.Contains(domainB.ToString());

Yes, example4.net is contained within ab123example4.net...

You could be using a HashSet<string> instead of a List<string>:

_CompanyOwnedDomains = new HashSet<string>(
    // ReadLines allows you to process before reading the entire file
    File.ReadLines(CompanyOwnedDomainsFileName)
        .Where(line => !String.IsNullOrEmpty(line))
        .Where(line => !line.StartsWith("#"))
        .Select(line => line.ToLower()));

As @Steve noticed, you would be better off by splitting the values and working with the entries directly:

_CompanyOwnedDomains = new HashSet<string>(
    // ReadLines allows you to process before reading the entire file
    File.ReadLines(CompanyOwnedDomainsFileName)
        .Where(line => !String.IsNullOrEmpty(line))
        .Where(line => !line.StartsWith("#"))
        .SelectMany(line => line.ToLower().Split(';')));

Then you could simplify the search with:

var match = _CompanyOwnedDomains
    .FirstOrDefault(domains => domains == domainA.ToString()
                            || domains == domainB.ToString());

Thanks, after some reading a hashset appears to be the correct direction to go in. I thought it might be possible to have multiple conditions but wasn't quite sure how to go about it. — Ipsum, Apr 25 '18 at 21:51

Steve · Answer 2 · 2018-04-26T06:23:48.050

2

I think you need another pass to correctly extract the domain names from your file.

First, use ReadLine instead of ReadAllLines, next you should check also for lines composed of all spaces not just for an empty line, finally, after converting the line to lower case you can split it to the semicolone and the resulting array could be added to the list using SelectMany.
In case you need to remove domain duplicates you could use Distinct.

_CompanyOwnedDomains = File
        .ReadLines(CompanyOwnedDomainsFileName)
        .Where(line => !String.IsNullOrWhiteSpace(line))
        .Where(line => !line.StartsWith("#"))
        .SelectMany(line => line.ToLower().Split(';'))
        .Distinct().ToList();

Now each domain is separated from the other domains and you can don't need to worry about false positives with Contains because you can use the Any method on the list to check if you have matches for your search

bool exist = _CompanyOwnedDomains.Any(x => x == "example4.net" || x == "example8.io");

edited Apr 26 '18 at 06:23

answered Apr 25 '18 at 21:46

Steve

213,761
22
232
286

Thanks. That certainly makes things easier. The more c# I'm reading, the more I'm liking it as a language. – Ipsum Apr 25 '18 at 21:53
1

I hadn't realized of the CSV inside each line, mind if I update my answer with that suggestion? It certainly looks better – Camilo Terevinto Apr 25 '18 at 21:54
1

I think this answer should mention the inefficiency of using `ReadAllLines()` in this context. I think OP would be better off using `ReadLines()` right? – maccettura Apr 25 '18 at 22:32

Lists of related items

2 Answers2