4

I'm looking for simple and powerful way to implement Windows flavoured * and ? wildcards matching in strings.

BeginsWith(), EndsWith() too simple to cover all cases, while translating wildcards expressions to regex'es will look to complex and I'm not sure about performance.

A happy medium wanted.

EDIT: I'm trying to parse .gitignore file and match the same files, as Git does. This means:

  • File should be out of repository's index (so I'm checking file's path against one stored in index)
  • Number of patterns in .gitignore can be large;
  • Number of files to check might also be large.
shytikov
  • 9,155
  • 8
  • 56
  • 103
  • 3
    Very vague. Post some inputs with the desired outputs. – H H Jan 12 '12 at 21:25
  • 1
    @Henk, most windows people will know whats meant, * any number of characters, ? being just 1 unknown character... ?blah*.txt would match any thing with a character before blah, any number of characters after blah and ending in .txt – Keith Nicholas Jan 12 '12 at 21:29
  • @HenkHolterman, I'm parsing `.gitignore` file and in my library I need to achieve the same behaviour as original Git offers. – shytikov Jan 12 '12 at 21:37
  • 3
    You can check this post out: http://stackoverflow.com/questions/188892/glob-pattern-matching-in-net – lbergnehr Jan 12 '12 at 21:41
  • @seldon, thanks! It's pretty close to what I'm actually searching for! – shytikov Jan 12 '12 at 21:54
  • Note that then windows pattern checking on files as a few idiosyncratic legacy features. I don't remember all of them, but some were related to matching the 8.3 name too. – CodesInChaos Jan 12 '12 at 22:22

4 Answers4

4

The equivalents of the Windows wildcards ? and * in regex are just . and .*.


[Edit] Given your new edit (stating that you're looking for actual files), I would skip the translation altogether and let .Net do the searching using Directory.GetFiles().

(note that, for some reason, passing a ? into Directory.GetFiles() matches "zero or one characters," whereas in Windows it always matches exactly one character)

BlueRaja - Danny Pflughoeft
  • 84,206
  • 33
  • 197
  • 283
  • Personaly, I don't like the idea to translate to regex'es. Because a lot of things need to be translated as well. For example dots, braces, they need to be escaped. And I cannot guarantee that users will be accurate in entering their wildcards. This turns the approach to complex. – shytikov Jan 12 '12 at 21:39
  • 4
    Writing your own pattern matching is certainly more complex than translating to regex. – CodesInChaos Jan 12 '12 at 21:45
  • 1
    @Alexey: See [Regex.Escape()](http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.escape.aspx) – BlueRaja - Danny Pflughoeft Jan 12 '12 at 21:53
  • @BlueRaja: Thanks, it's the way I've looking for! – shytikov Jan 12 '12 at 21:57
  • @CodesInChaos Not really in this case, check my answer for another question: http://stackoverflow.com/a/16488364/119561 – deerchao May 10 '13 at 18:35
2

To get an exact match including all corner-cases, use

System.IO.Directory.GetFiles(myPath, myPattern)

You may have to create some tempfiles form your targetstrings first.

In other words, I think you should keep your patterns dry until it's time to meet the filesytem.

H H
  • 263,252
  • 30
  • 330
  • 514
  • It's almost useless for my case, since I'm getting all files in the folder the way you pointed, than I checking how many of them out of Git index. And among files that left I searching for these who don't match set of patterns in the `.gitignore`. – shytikov Jan 13 '12 at 07:50
1

Converting * and ? to regex is quite easy.
For ? replace the "?" with ".{1}" and for * replace the "*" with ".+?"

That should get you the same behaviour as wildcard matching on windows.

EDIT: boolean PathMatchSpec(input, pattern) will do the job.

Private Declare Auto Function PathMatchSpec Lib "shlwapi" (ByVal pszFileParam As String, ByVal pszSpec As String) As Boolean
Sam Axe
  • 33,313
  • 9
  • 55
  • 89
1

You should go with regex based approach unless your data volume is humungous or you have data-points to say regex will severely impact performance.

If that is the case, any other solution will also likely affect the performance and you will probably need to hand-roll something.

Miserable Variable
  • 28,432
  • 15
  • 72
  • 133