4

I read data from utf-8 encoded file. Part of content of this file is then used as a name for newly created folder. However, my folder name is:

bohou�_120328 instead of bohouš_120328

How can I set proper coding for the name of newly created folder? Thanks.

edit:

I am reading information from file this way:

System.IO.StreamReader file = new System.IO.StreamReader(nameOfFile);

 while ((j = file.ReadLine()) != null) { 
    //manipulating string in j
 }

then creating directory with

if (Directory.Exists(folder) == false) {
                                    Console.WriteLine("creating directory " + folder);
                                    System.IO.Directory.CreateDirectory(@folder);
                                } 

If I run my application on my Windows 7, 64bit computer, everything is fine. However, if I run in on other computers with older systems like WinXP, coding is just wrong and looks like this

bohou�_120328

Before using variable to creating folder, I write i to output, but everything is fine. Even folder names are fine. But just on my computer, unfortunately.

edit2:

Things are getting even more weird. I used this code How do I remove diacritics (accents) from a string in .NET? to remove diacritics, because names without diacritics are just fine for me.

However, again:

  1. running code on my computer yields into bohous_120328
  2. running code on other computers AND my flash disk yields into bohou�_120328

I swear it is the same code, as I COPIED my .exe file.

Debugger says that the problem is already in my string variable before creating folder. I do not understand, how the environment influences my variables in this case.

Will be happy for explanation :-)

Community
  • 1
  • 1
Perlnika
  • 4,796
  • 8
  • 36
  • 47

5 Answers5

6

On Windows, you do not specify encoding of file or directory names. On NTFS they are always encoded with what is essentially UTF-16. As long as you read in the string correctly, CreateDirectory will do what you want. I'd suspect that you either didn't read your UTF-8 file as UTF-8, or your file isn't actually UTF-8. Take a look in the debugger what the string value is before you call CreateDirectory with it.

Jason Malinowski
  • 18,148
  • 1
  • 38
  • 55
4

I think you read the file wrong. You should check the read text first.

Euphoric
  • 12,645
  • 1
  • 30
  • 44
  • You were right. On some computers it read the file properly, on some not. It was the reading issue. – Perlnika May 11 '12 at 09:52
1

My suspicion is that this has nothing to do with how your code is reading the text or how it is being written out as a directory name. I'll bet that it's a limitation of the OS or partition type that you're creating the directory in. My guess is either the OS/partition can't handle the character with the diacritic, or it is being written correctly only to be displayed incorrectly.

This article gives some info on how to extend your file system to allow for diacritic characters (for NTFS, anyway):

http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/fsutil_behavior.mspx

There may be similar info for other partition types (you still haven't answered sixlettervariables' comment asking what partition type you're using).

jdmcnair
  • 1,305
  • 15
  • 33
1

The default encoding of a StreamReader is UTF-8. If your file is not UTF-8 encoded, you'll never be sure to get the correct characters on on localized versions of the operating system.

I mean :

  • File content as UTF-8 & new StreamReader(path) : encoding match : no problem

  • File content as UTF-8 & new StreamReader(path, Encoding.Default) : partial match, only chars corresponding to the current OS codepage will be correctly decoded

  • File content as ANSI (default on Windows) & new StreamReader(path) : encoding mismatch, AFAIK only ASCII chars will be decoded

  • File content as ANSI & new StreamReader(path, Encoding.Default) : partial match, only chars corresponding to the current OS codepage will be correctly decoded

Checking your file encoding and the OS default codepage may help you to find the issue.

JoeBilly
  • 3,057
  • 29
  • 35
1

StreamReader attempts to detect encoding, and doesn't use UTF8 by default.

I would suggest to provide Encoding.UTF8 in the constructor.
If this doesn't help, my guess is that your file content is not really UTF8, and you are dependent on the computer's regional settings.

eitanpo
  • 344
  • 1
  • 10