0

I found several questions on Stack Overflow about the Directory.GetFiles() but in all of the cases, they explain how to use it to find a specific extension or a set of files through multiple criteria. But in my case, what i want is get a search pattern for Directory.GetFiles() using regular expressions, which return all of the files of the directory but the set that i'm specifying. I mean not declare the set that i want but the difference. For example, if i want all of the files of a directory but not the htmls. Notice that, i',m know it could be achieve it in this way:

var filteredFiles = Directory
.GetFiles(path, "*.*")
.Where(file => !file.ToLower().EndsWith("html")))
.ToList();

But this is not a very reusable solution, if later i want to filter for another kind of file i have to change the code adding an || to the Where condition. I'm looking for something that allows me create a regex, which consist in the files that i don't want to get and pass it to Directory.GetFiles(). So, if i want to filter for more extensions later, is just changing the regex.

javier_el_bene
  • 450
  • 2
  • 10
  • 25

3 Answers3

7

You don't need a regex if you want to filter extension(s):

// for example a field or property in your class
private HashSet<string> ExtensionBlacklist { get; } =
    new HashSet<string>(StringComparer.InvariantCultureIgnoreCase)
    {
        ".html",
        ".htm"
    };
// ...

var filteredFiles = Directory.EnumerateFiles(path, "*.*")
    .Where(fn => !ExtensionBlacklist.Contains(System.IO.Path.GetExtension(fn)))
    .ToList();
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
1

I would recommend against using regex in favor of something like this:

var filteredFiles = Directory
    .GetFiles(path, "*.*")
    .Where(file => !excludedExtensions.Any<string>((extension) => 
    file.EndsWith(extension, StringComparison.CurrentCultureIgnoreCase)))
    .ToList();

You can pass it a collection of your excluded extensions, e.g.:

var excludedExtensions = new List<string>(new[] {".html", ".xml"});

The Any will short-circuit as soon as it finds a match on an excluded extension, so I think this is preferable even to excludedExtensions.Contains(). As for the regex, I don't think there's a good reason to use that given the trouble it can buy you. Don't use regex unless it's the only tool for the job.

rory.ap
  • 34,009
  • 10
  • 83
  • 174
0

So essentially you just don't know how to perform a regex match on a string?

There is Regex.IsMatch for that very purpose. However, you could also change the code to look up the extension in a set of extensions to filter, which would also allow you to easily add new filters.

Joey
  • 344,408
  • 85
  • 689
  • 683