2

I have a linq statement that needs to read through a text file. It takes alist of urls and strips them back to get the domains. I then want to take the unique domains and write them back out to a file.

Here's what I have so far:

        var urls = File.ReadAllLines(badLinks)
            .Where(x => x.IsNotNullOrEmpty())
                .Select(x => ManipulateUrl(x))
                .Distinct()
                .ToList();

The thing is, I've noticed that Distinct() function only includes a domain if the manipluated entry is Distinct when I really re-evaluate the list after the changes and create a list of unique enties (domains).

Any help appreciated.

* UPDATE *

Sorry guys, after breaking down the list it turns out that the source file has the problem. Was difficult to see with 100k records in.

dotnetnoob
  • 10,783
  • 20
  • 57
  • 103
  • 1
    what is returning from `ManipulateUrl`, `string` ? – Selman Genç Feb 03 '14 at 13:38
  • if a url starts off 'http://mydomain.com/aff=123' the string returned will be 'http://mydomain' – dotnetnoob Feb 03 '14 at 13:41
  • 2
    Could you re-explain the issue? I can't understand what's wrong. – Douglas Feb 03 '14 at 13:53
  • The output items of your list is determined by the `Select`. Assuming you want to keep the original items but filter them based on a selector, this is a duplicate of [this question](http://stackoverflow.com/questions/742682/distinct-list-of-objects-based-on-an-arbitrary-key-in-linq). – nmclean Feb 03 '14 at 14:06

1 Answers1

2

your code certainly looks correct, the only thing that springs to mind is whether the ManipulateURL is throwing it off.

Have you tried splitting the code into two separate statements, i.e:

var urls = File.ReadAllLines(badLinks)
            .Where(x => x.IsNotNullOrEmpty())
            .Select(x => ManipulateUrl(x));

var distinctURLS = urls.Distinct().ToList();

At least doing this way you can step through the code and verify urls is being populated as you'd expect.

Gavin Coates
  • 1,366
  • 1
  • 20
  • 44