1

Related to issue: why '?' appears as output while Printing chinese text

I want to obtain some texts from a websource (jira). Later I write them back to file using a StreamWriter The problem is that the following line I obtained from Jira (and that is still correct after reading into memory (veryfied via Debugger):

Deutsch:
Dies ist ein einfacher beispieltext der nur zum spass eingegeben wurde
Japanisch:
これが唯一の楽しみのために入力されたテキストの簡単な例であります
Chinesisch (einfach):
这是文本的一个简单的例子只是为了好玩
Same in Traditional Chinese, Stackoverflow doesn't like this text^^

If I save the issue that contains this text to file this is in the file:

<description><p>Deutsch:<br/> Dies ist ein einfacher beispieltext der nur zum spass eingegeben wurde<br/> Japanisch:<br/> ã“れãŒå”¯ä¸€ã®æ¥½ã—ã¿ã®ãŸã‚ã«å…¥åŠ›ã•れãŸãƒ†ã‚­ã‚¹ãƒˆã®ç°¡å˜ãªä¾‹ã§ã‚りã¾ã™<br/> Chinesisch (einfach):<br/> 这是文本的一个简å•的例å­åªæ˜¯ä¸ºäº†å¥½çŽ©<br/> Chinesisch (Traditionell):<br/> 這是文本的一個簡單的例å­åªæ˜¯ç‚ºäº†å¥½çŽ©</p></description>

This is how I try to write the above to file:

    Dim parts = tempstring.Split(vbCrLf)
    My.Computer.FileSystem.CreateDirectory(ConsoleApplication1.Paths.TEMPDIRECTORY)
    Dim sw As New StreamWriter(OldFilePath, False)
    For Each st In parts
        st = st.Trim()
        'st = st.Replace(vbLf, "")
        'Some parts start with -, which has to be removed --> is the end of a comment in XML notation is better
        'to be retained, to prevent errors or enable analyses of the XML if needed
        If st.StartsWith("-") And Not st.StartsWith("-->") Then
            st = st.Substring(1)
        End If
        st = st.Trim
        sw.WriteLine(st)
    Next
Community
  • 1
  • 1
lsteinme
  • 750
  • 1
  • 6
  • 20

1 Answers1

2

While the answer on the linked page suggests that UTF8 is default encoding, this was not true for me, for the default encoding was:

System.Text.SBCSCodePageEncoding

To fix this simply changing all StreamWriters/Readers to explicitly telling them the encoding style fixed the issue, for the code above it was just instead of:

Dim sw As New StreamWriter(OldFilePath, False)

the correct way was:

Dim sw As New StreamWriter(OldFilePath, False, Encoding.UTF8)

That fixed the issue with the strange formats for Chinese/Japanese and some other languages and characters.

lsteinme
  • 750
  • 1
  • 6
  • 20