1

I am trying to convert a string encoded in UTF-8 to windows-1255 in VB.NET with no luck. Admittedly, I don't know VB but have tried using an example at MSDN and modifying it to my needs:

Public Function Utf82Hebrew(ByVal Str As String) As String
    Dim ascii As Encoding = Encoding.GetEncoding("windows-1255")
    Dim unicode As Encoding = Encoding.Unicode

    ' Convert the string into a byte array. 
    Dim unicodeBytes As Byte() = unicode.GetBytes(Str)

    ' Perform the conversion from one encoding to the other. 
    Dim asciiBytes As Byte() = Encoding.Convert(unicode, ascii, unicodeBytes)

    ' Convert the new byte array into a char array and then into a string. 
    Dim asciiChars(ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length)-1) As Char
    ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0)
    Dim asciiString As New String(asciiChars)

    Utf82Hebrew = asciiString
End Function

This function doesn't actually do anything—the string remains in UTF-8. However, if I change this line:

Dim ascii As Encoding = Encoding.GetEncoding("windows-1255")

To this:

Dim ascii As Encoding = Encoding.ASCII

Then the function returns question marks in the place of the string.

Does anyone know how to properly convert a UTF-8 string to a specific encoding (in this case, windows-1255), and/or what I'm doing wrong in the above code?

Thanks in advance.

Ynhockey
  • 3,845
  • 5
  • 33
  • 51
  • What text are you trying to convert? – Sam Apr 28 '13 at 14:28
  • It can be any string in Hebrew that's input in a web form. Example: שלום – Ynhockey Apr 28 '13 at 14:35
  • 3
    There is no such thing as "utf-8 string", strings are always encoded in utf-16 in .NET. Utf-8 can only be stored in byte[]. After you got utf-8 bytes into a string somehow, the original data is destroyed beyond repair, utf-8 contains byte values that don't have a utf-16 representation. You will need to fix this problem at its root and fix the code that generated the "Str" argument. – Hans Passant Apr 28 '13 at 15:28
  • possible duplicate of [How to convert a UTF-8 string into Unicode?](http://stackoverflow.com/questions/11293994/how-to-convert-a-utf-8-string-into-unicode) – Hans Passant Apr 28 '13 at 15:29
  • 1
    A `System.String` is always UTF-16 in .net. A Utf-8 string would be represented as a byte array in .net. – CodesInChaos Apr 28 '13 at 16:06
  • Thank you for the comments. The information about how .NET stores string data was helpful for understanding the problem, and will help with similar issues in the future. However, I am still unable to solve the underlying issue, so let me rephrase the question: How do I convert from any encoding to windows-1255? Theoretically the above code should do it since it converts the existing string to bytes before doing other manipulations, but it's not working. – Ynhockey Apr 29 '13 at 08:46

1 Answers1

0

I modified your code. It is very straightforward to convert text from one encoding into another. This is how you should do it in VB.Net. Microsof Windows file encoding is 1252, not 1255.

    Public Function Utf82Hebrew(ByVal Str As String) As String
    Dim ascii As System.Text.Encoding = System.Text.Encoding.GetEncoding("1252")
    Dim unicode As System.Text.Encoding = System.Text.Encoding.Unicode

    ' Convert the string into a byte array. 
    Dim unicodeBytes As Byte() = unicode.GetBytes(Str)

    ' Perform the conversion from one encoding to the other. 
    Dim asciiBytes As Byte() = System.Text.Encoding.Convert(unicode, ascii, unicodeBytes)

    ' Convert the new byte array into a char array and then into a string. 
    Dim asciiString As String = ascii.GetString(asciiBytes)

    Utf82Hebrew = asciiString
End Function
Meisam Rasouli
  • 301
  • 2
  • 5
  • Hi, thanks for answering. Admittedly this is really old and not relevant for me personally anymore, but just so it's helpful to others: I don't understand the nature of your change. You just changed the target encoding, but the idea is to specifically change to windows-1255. Otherwise it's not useful. Also as I remember this code, changing encodings didn't do anything, except to Encoding.ASCII. – Ynhockey Dec 12 '20 at 22:36