13

MSDN says:

HANDLE WINAPI FindFirstFile( LPCTSTR lpFileName, LPWIN32_FIND_DATA lpFindFileData );

lpFileName The directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?)...

Until today I didn't noticed the “for example”.

Assuming you have a “c:\temp” directory, the code below displays “temp”. Notice the searched directory: “c:\temp>”. If you have a “c:\temp1” directory and a “c:\tem” directory, FindNextFile will find “temp1” but will not find “tem”. I assumed that ‘<’ will find “tem” but I was wrong: it behaves in the same way. It does not matter how many ‘<’/’>’ you append: the behavior is the same.

From my point of view, this is a bug ('>'&'<' are not valid characters in a file name). From Microsoft’s point of view it may be a feature.

I did not manage to find a complete description of F*F’s behavior.

const TCHAR* s = _T("c:\\temp>");
  {
    WIN32_FIND_DATA d;
    HANDLE h;

    h = FindFirstFile( s, &d );
    if ( h == INVALID_HANDLE_VALUE )
    {
      CString m;
      m.Format( _T("FindFirstFile failed (%d)\n"), GetLastError() );
      AfxMessageBox( m );
      return;
    }
    else
    {
      AfxMessageBox( d.cFileName );
      FindClose( h );
    }
  }

Edit 1:

In the first place I have tried to use Windows implementation of _stat. It worked fine with illegal characters ‘*’ and ‘?’, but ignored ‘>’, so I stepped in and noticed that the implementation took special care of the documented wildcards. I ended in FFF.

Edit 2:

I have filled two bug forms: one for FFF the other for _stat. I am now waiting for MS’s answer.

I do not think that it is normal to peek into something that is supposed to be a black-box and speculate. Therefore, my objections are based on what the “contract” says: “lpFileName [in] The directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?). …” I am not a native English speaker. Maybe it means “these are not the only wildcards”, maybe not. However, if these are not the only wildcards, they should have listed all (maybe they will). At this point, I think the MS’s resolution will be “By Design” or “Won’t fix”.

Regarding _stat, which I think it is an ISO function, MSDN says: “Return value: Each of these functions returns 0 if the file-status information is obtained.” It does not say a thing about the wildcards, documented or not. I do not see what kind of information _stat may retrieve from “c:\temp*” or “c:\temp>>”. It is highly unlikely that someone is relying on current behavior, so they may issue a fix.

Edit 3:

Microsoft has closed the _stat bug as Fixed.

"... We have fixed this for the next major release of Visual Studio (this will be Visual Studio “14,” but note that the fix is not present in the Visual Studio “14” CTP that was released last week). In Visual Studio “14,” the _stat functions now use CreateFile to query existence and properties of a path. The change to use CreateFile was done to work around other quirks related to file permissions that were present in the old FindFirstFile-based implementation, but the change has also resolved this issue. ..."

zdf
  • 4,382
  • 3
  • 18
  • 29
  • I was able to reproduce the behavior on Windows 7 Home. I had to call `FindNextFile()` to find `c:\temp1` as `c:\temp` was reported first by `FindFirstFile()`. If I remove the `>`, only `c:\temp` is found. Seems like Microsoft is internally treating `>` as if it were `*` or `?` instead. Did you try using other illegal characters to see if they exhibit the same behavior? – Remy Lebeau Jun 12 '14 at 17:43
  • I just did: Space " . > – zdf Jun 12 '14 at 18:06
  • 1
    Same behavior. No yelling. " and . are legal, of course but I did not expected the same result. I expected some kind of error [no matching "], for instance. The good thing is it didn't swallow |. – zdf Jun 12 '14 at 18:09
  • Sorry, " is not legal. I guess the behavior for period (.) is normal, for "c:\\temp" is the same thing as "c:\\temp." – zdf Jun 12 '14 at 18:30
  • 2
    I gave it a go on Windows 7. `<` and `>` aren't quite the same. `<` behaves like `*` and `>` behaves like `?`. `"` is different: it is ignored at the end of a string but otherwise prevents matching. I don't know why this happens. – arx Jun 12 '14 at 18:40
  • 1
    If this _is_ buggy, [it certainly wouldn't be for the first time](http://www.codeguru.com/cpp/w-p/files/article.php/c4441/Work-Around-the-Bug-of-Deprecated-DOS-Wildcards.htm). – Lightness Races in Orbit Jun 12 '14 at 19:22
  • The question has [been asked before (10k+)](http://stackoverflow.com/q/10141949/560648) but was deleted and I can't tell why. – Lightness Races in Orbit Jun 12 '14 at 19:26
  • @LightnessRacesinOrbit: That had nothing to do with programming, so it was off-topic. Also extremely unclear. – Ben Voigt Jun 12 '14 at 20:12
  • 1
    @BenVoigt: Oh good point, it's off-topic. Hadn't actually spotted that. As for quality, I'm not disputing that it's a poor question but the closure by "Community" and no comments at all threw me off, that's all. – Lightness Races in Orbit Jun 12 '14 at 20:17
  • 1
    @RemyLebeau Suggest you remove your comment about '>' not being a wildcard - it definitely is (although an extremely obscure one). – nobody Jun 12 '14 at 22:08
  • @AndrewMedico: but `>` is not an officially supported wildcard at Win32 API layer, it is supported by a lower-level API and just happens to get silently bubbled up the layers and is not documented as such. – Remy Lebeau Jun 12 '14 at 23:19
  • 1
    @RemyLebeau I think if Microsoft didn't intend for it to be an officially supported wildcard, they would have done something about it by now. Note how the MSDN says "... can include wildcard characters, *for example*, an asterisk (*) or a question mark (?)" - which clearly implies that `*` and `?` are not the only officially supported wildcard characters. – nobody Jun 12 '14 at 23:32
  • The `[MS-FSA]` document (aka: File System Algorithms) details these. It's essentially a word by word explanation (they even call it pseudo code) of what happens in `FsRtlIsNameInExpression`. @AndrewMedico totally agree. That comment should be removed as it's misleading. Flagged it. – 0xC0000022L Oct 17 '18 at 13:40

1 Answers1

20

According to a post on the OSR ntfsd list from 2002, this is an intentional feature of NtQueryDirectoryFile/ZwQueryDirectoryFile via FsRtlIsNameInExpression. < and > correspond to * and ?, but perform matching "using MS-DOS semantics".

The FsRtlIsNameInExpression states:

The following wildcard characters can be used in the pattern string.

Wildcard character  Meaning

* (asterisk)        Matches zero or more characters.

? (question mark)   Matches a single character.

DOS_DOT             Matches either a period or zero characters beyond the name
                    string.

DOS_QM              Matches any single character or, upon encountering a period
                    or end of name string, advances the expression to the end of
                    the set of contiguous DOS_QMs.

DOS_STAR            Matches zero or more characters until encountering and
                    matching the final . in the name.

For some reason, this page does not give the values of the DOS_* macros, but ntifs.h does:

//  The following constants provide addition meta characters to fully
//  support the more obscure aspects of DOS wild card processing.

#define DOS_STAR        (L'<')
#define DOS_QM          (L'>')
#define DOS_DOT         (L'"')
nobody
  • 19,814
  • 17
  • 56
  • 77
  • Does "MS-DOS semantics" also mean it's restricted to 8.3 names, or is it indifferent of this? – Jongware Jun 12 '14 at 20:55
  • There is an explaination in the [FsRtlIsNameInExpression documentation](http://msdn.microsoft.com/en-us/library/windows/hardware/ff546850%28v=vs.85%29.aspx) but I'm not entirely sure how to interpret it. It seems to be focused on matching the "name string" (i.e. before the extension), not 8.3 names. – nobody Jun 12 '14 at 20:58
  • 1
    It's baffling to me that the `FsRtlIsNameInExpression` documentation wouldn't actually identify those three characters. So typical of MSDN. – Lightness Races in Orbit Jun 12 '14 at 21:45
  • The `[MS-FSA]` document (aka: File System Algorithms) which Microsoft disclosed, details these. But `ntifs.h` also appears to have `DOS_QM` and friends (checked 7600.16385.1 WDK). And besides it is not at all baffling, because they have symbolic names. Many of the constants with symbolic names are not shown with their underlying value in the documentation. And for a reason: you want to prevent users from actually using the underlying value without referring to the name. Because you can then change the definition in future versions in a deterministic fashion. – 0xC0000022L Oct 17 '18 at 13:33