-1

I tried to get a full list of all the files in a folder like this:

#include<Windows.h>
#include<iostream>
#include<stdio.h>

using namespace std;

void main()
{
HANDLE dateiHandle;
WIN32_FIND_DATA wfindD;

dateiHandle = FindFirstFile(L"E:\\Roman\\PIC\\funpics\\*", &wfindD);
do
{
    cout << wfindD.cFileName << endl;
} while (FindNextFile(dateiHandle, &wfindD));

FindClose(dateiHandle);
while (1)
{

}
}

and I can't figure out why the results are like this:

00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
00AFFCCC
Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
Arkanipro MA
  • 9
  • 1
  • 3
  • Even if the names were hex codes because of some bug, they would not be the same for every filename. The API is usually not the problem... – Anders Mar 10 '18 at 03:18
  • Running this code under a debugger would have given away the type of `wfindD.cFileName` immediately. And while there is no overload for [`operator<<(std::basic_ostream)`](http://en.cppreference.com/w/cpp/io/basic_ostream/operator_ltlt2) for the type of `wfindD.cFileName`, it should be obvious, that it will not interpret the data pointed to, but print the pointer value. Downvoted for all the right reasons. – IInspectable Mar 10 '18 at 03:46
  • The dupes are not so good (`wcout` won't display non-ASCII characters in the console without further ado). [Here is a better one](https://stackoverflow.com/q/2492077/7571258). – zett42 Mar 10 '18 at 09:37

2 Answers2

7

TCHAR will be typedefed to wchar_t if you have unicode support enabled in your project (the default recent versions of Visual Studio). std::cout doesn't have any special handling for a wchar_t* and falls back on the void* overload for operator<<, which just prints the memory address pointed to as a hex number. Use std::wcout instead, which does have an operator<< overload for wchar_t*, and will print the strings like you expect.

As a side note, you'll have fewer surprises if you always explicitly use the A (for ANSI) or W (for wide) names for Win32 functions and structures that handle strings. To support non-ascii strings, you're generally better off using the W versions. In this case, FindFirstFileW, FindNextFileW, and WIN32_FIND_DATAW. FindClose doesn't directly interact with strings, so there's no A or W version of it.

Miles Budnek
  • 28,216
  • 2
  • 35
  • 52
  • Unicode is the default in projects created using the New Project wizard only in recent versions of Visual Studio (I believe since VS 2015, or maybe VS 2012). Dev-C++ or Code::Blocks default to ANSI encoding, to this day. So no, Unicode is not the default, as you put it. Although it should be, and should have been for almost 2 decades. – IInspectable Mar 10 '18 at 03:27
  • @IInspectable Unicode is the default in VS2010, and that's the oldest I have installed at the moment to check. I suppose it's true that I shouldn't assume Windows == Visual Studio though. – Miles Budnek Mar 10 '18 at 05:47
  • _TCHAR, _tmain, FindFirstFileA, etc was to support building applications for the low-end/low-requirements branch of Windows (Windows 95 through Me) in addition to the normal branch (Windows NT 4 through 10). That should be just trivia since that branch died 2001-2006. But, unfortunately, TCHAR is still around in the MSVC toolset. You don't need it. – Tom Blodget Mar 10 '18 at 18:45
1

Use std::wcout instead of std::cout and you'll see the correct names printed out. 1

Your app is compiled for Unicode, so you're really calling FindFirstFileW(), which modifies a WIN32_FIND_DATAW structure, whose cFileName member is type WCHAR[], which is a double-byte "wide" character string.

1 Although, if the file names really do have double-byte characters (over 255), such as Japanese, then you may need to tweak other settings in your Command Prompt to actually see the double-byte characters correctly.

Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770
Dan Korn
  • 1,274
  • 9
  • 14
  • `wcout` operates on `wchar_t`. `wchar_t` on Windows means UTF-16. All code points are encoded using at least 2 bytes in UTF-16. You are confusing code points and code units, and I don't even know, what you mean by *"double-byte characters (over 255)"*. That doesn't make **any** sense. Besides, the console is very much capable of displaying UTF-16, so no other tweaks or settings are required. This contribution is not helpful. – IInspectable Mar 10 '18 at 03:33
  • @IInspectable [Not so much](https://i.stack.imgur.com/WD62t.png). Getting the windows console to print anything other than ASCII text is a chore. – Miles Budnek Mar 10 '18 at 06:32
  • @MilesBudnek: The Windows Console is perfectly capable of displaying **any** Unicode code point. The chore of which you speak is getting the C++ I/O Stream library to cooperate. [The console has no issues](https://imgur.com/a/TxcQX), assuming a font with appropriate glyphs is installed. The default console font (Lucida Console) supports the entire BMP, that clearly contains loads of characters that aren't ASCII characters. – IInspectable Mar 10 '18 at 12:40
  • ["double-byte"](https://msdn.microsoft.com/en-us/library/windows/desktop/dd317794%28v=vs.85%29.aspx) is not normally used in relation to any Unicode encoding, even UCS-2. – Tom Blodget Mar 10 '18 at 18:43
  • Would you be happy if I said "wchar_t values over 255"? Also, on my Windows 10 system, the default font for the Command Prompt is Consolas, and the default encoding is code page 437. So Japanese just looks like question marks, Yes, if you use a font that supports most of the UCS-2 range, and the correct code page, you can get Japanese to show up, but as you say, that's an assumption. So my point that you **MAY** need to tweak some settings is not wrong. Beginners especially tend to be tripped up by this, so I think it's more helpful to give them a heads up about it than to lecture me. – Dan Korn Mar 12 '18 at 16:38
  • *"wchar_t values over 255"* equally makes no sense. There is nothing special about the value 255. I also don't understand, why you believe, that the code page were in any way related, when using Unicode output. And if you have a font (like the default console font in Windows 10), that supports the entire UCS-2 range, you will not necessarily be able to see Japanese characters. UCS-2 cannot represent characters outside the BMP. Japanese characters aren't part of the BMP. I'm sorry if this sounds like lecturing, but lecturing you need. Badly. – IInspectable Mar 12 '18 at 21:43
  • This semantics argument is way off of the main point of the original question. But "wchar_t values over 255" means exactly what it says: Unicode characters whose code points are over 255 (0xFF). There's nothing special about them per se, other than the fact that many fonts are limited to 8-bit characters (code points), so those won't work for Japanese and other such languages. So if the OP had a Japanese file name, it probably won't show up correctly in his Command Prompt, even with `std::wcout`, which was my point. – Dan Korn Mar 13 '18 at 00:33
  • Also, your statement that "Japanese characters aren't part of the BMP" is wrong. [This page](http://www.rikai.com/library/kanjitables/kanji_codes.unicode.shtml) shows all the common CJK characters have Unicode code points less than 0xFFFF. There are some ideographs outside that range, but they are rarely used in modern Japanese. – Dan Korn Mar 13 '18 at 00:35
  • Since you still somehow believe, that code pages were in any way involved in Unicode output, the special value you are after is 127. The final ASCII characters. Everything in between 128 and 255 is controlled by the code page. Not relevant with Unicode. But then, you confuse code points with code units, and nothing useful will come out of this, once this confusion is established. Please see [The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)](https://www.joelonsoftware.com/2003/10/08/) to build a solid foundation. – IInspectable Mar 13 '18 at 10:11
  • That's a great article. The author states, "we decided to do everything internally in UCS-2 (two byte) Unicode, which is what Visual Basic, COM, and Windows NT/2000/XP use as their native string type. In C++ code we just declare strings as wchar_t (“wide char”)". I hope you'll set him straight about how his application can't possibly work in Japanese because it doesn't support anything outside of the BMP. – Dan Korn Mar 13 '18 at 16:19
  • Well, that's wrong. Obviously. UCS-2 was all the hotness, when Unicode 1.1 was published in 1993. Neither Windows 2000 nor Windows XP use it as their internal string encoding, though. Both use UTF-16. While a `wchar_t` (on a Windows-compatible compiler) can encode a UCS-2 code point, it is also used with UTF-16 encodings. So the fact, that they used a `wchar_t` is no indication, that they actually used UCS-2. Thanks for pointing out, that Joel can be wrong, too. And it very much sounds like he didn't even notice, when their code started to support UTF-16, because no change was required. – IInspectable Mar 13 '18 at 17:48
  • So my company's application that has been using UCS-2 for Japanese for years can't work either? – Dan Korn Mar 13 '18 at 19:19
  • I wouldn't know of any OS, that supports UCS-2, so I'm not sure whether you know what your company's application does. But that is all besides the point, that this answer is wrong. That simple. – IInspectable Mar 14 '18 at 20:45