0

I'm trying to write wstrings (in Russian) in Linux, in C++ code in the following code:

ofstream outWFile;
outWFile.open("input.tab");
outWFile<< WStringToString(w->get_form());
outWFile<<"\t";
outWFile<<WStringToString(w->get_tag());

std::string WStringToString(const std::wstring& s)
{
    std::string temp(s.length(),' ');
    std::copy(s.begin(), s.end(), temp.begin());
    return temp;
}

input.tab contents are invalid

I have tried to do what is proposed in stackoverflow including Unable to write a std::wstring into wofstream However I didn't help. Thank you in advance

Community
  • 1
  • 1
rok
  • 9,403
  • 17
  • 70
  • 126
  • possible duplicate of [Unable to write a std::wstring into wofstream](http://stackoverflow.com/questions/5104329/unable-to-write-a-stdwstring-into-wofstream) – Griwes Aug 14 '12 at 20:35
  • Those `ostream<<` should be `outWFile`? – Etherealone Aug 14 '12 at 20:38
  • Your code doesn't show writing to a file at all, only to the standard output. – bames53 Aug 14 '12 at 21:03
  • How do you expect this code should work? How do you want it to work? What effort did you make to understand your problem before posting this? Did you notice that function `WStringToString` completely spoils non-ASCII content? – Serge Dundich Aug 14 '12 at 21:07

2 Answers2

1

Your conversion function is at fault: it will end up messing up all characters that have a code point of 128/256 or larger (depending on your locale).

Use wcstombs instead (make sure to use a UTF-8 locale).

eq-
  • 9,986
  • 36
  • 38
  • WStringToString is certainly wrong, but it's not clear that that's being used to write to the file. Also I wouldn't recommend relying on locale based conversions like wcstombs, for the reason you mention; you have to ensure the locale is appropriate for your use, and if it's not it has to be changed. This is not reliable or user friendly. Locale based conversions should only be used when you actually want to use the locale encodings, whatever they are (and this is rare IME). – bames53 Aug 14 '12 at 20:44
  • @bames53: Yes, locales can cause problems. I personally do most of my Linux coding under the assumption (where applicable) that it's used with a UTF-8 locale, as is the case with any modern Linux system, but I guess I could've suggested something less fragile, now that C++ has Unicode tools built-in. – eq- Aug 14 '12 at 20:50
  • Yes, I also just assume UTF-8 always for everything and ignore the locale encoding. The only problems occur for people who deserve them. – bames53 Aug 14 '12 at 20:53
1

I think you will be better using directly the wstring content.

outWfile.write(w->get_tag()->data(), w->get_tag()->size()*sizeof(wchar_t));
// I used data() assuming the string and wstring methods are the same?
// Anyhow, get the pointer to wstring's data here.
Etherealone
  • 3,488
  • 2
  • 37
  • 56