1

I have a string in VB.net that may contain something like the following:

This is a 0x000020AC symbol

This is the UTF-32 encoding for the Euro Symbol according to this article http://www.fileformat.info/info/unicode/char/20ac/index.htm

I'd like to convert this into

This is a € symbol

I've tried using UnicodeEncoding() class in VB.net (Framework 2.0, as I'm modifying a legacy application)

When I use this class to encode, and then decode I still get back the original string.

I expected that the UnicodeEncoding would recognise the already encoded part and not encode it against. But it appears to not be the case.

I'm a little lost now as to how I can convert a mixed encoded string into a normal string.

Background: When saving an Excel spreadsheet as CSV, anything outside of the ascii range gets converted to ?. So my idea is that if I can get my client to search/replace a few characters, such as the Euro symbol, into an encoded string such as 0x000020AC. Then I was hoping to convert those encoded parts back into the real symbols before I insert to a SQL database.

I've tried a function such as

Public Function Decode(ByVal s As String) As String
    Dim uni As New UnicodeEncoding()
    Dim encodedBytes As Byte() = uni.GetBytes(s)
    Dim output As String = ""

    output = uni.GetString(encodedBytes)

    Return output
End Function

Which was based on the examples on the MSDN at http://msdn.microsoft.com/en-us/library/system.text.unicodeencoding.aspx

It could be that I have a complete mis-understanding of how this works in VB.net. In C# I can simply use escaped characters such as "\u20AC". But no such thing exists in VB.net.

Elarys
  • 639
  • 3
  • 10
  • 20
  • 1
    I would use [Regex.Replace](http://msdn.microsoft.com/en-us/library/ht1sxswy(v=vs.80).aspx) to match `0x...` and use the custom match evaluator to convert the matched value back into a character. – Heinzi Aug 02 '12 at 10:49
  • possible duplicate of [How to represent Unicode Chr Code in VB.Net String literal?](http://stackoverflow.com/questions/3144053/how-to-represent-unicode-chr-code-in-vb-net-string-literal) – Hans Passant Aug 02 '12 at 11:27

2 Answers2

1

Based on advice from Heinzi I implemented a Regex.Replace method using the following code, this appear to work for my examples.

Public Function Decode(ByVal s As String) As String
 Dim output As String = ""
 Dim sRegex As String = "0x[0-9a-zA-Z]{8}"

 Dim r As Regex = New Regex(sRegex)

 Dim myEvaluator As MatchEvaluator = New MatchEvaluator(AddressOf HexToString)

 output = r.Replace(s, myEvaluator)

 Return output
End Function

Public Function HexToString(ByVal hexString As Match) As String
 Dim uni As New UnicodeEncoding(True, True)
 Dim input As String = hexString.ToString
 input = input.Substring(2)
 input = input.TrimStart("0"c)

 Dim output As String

 Dim length As Integer = input.Length
 Dim upperBound As Integer = length \ 2
 If length Mod 2 = 0 Then
  upperBound -= 1
 Else
  input = "0" & input
 End If
 Dim bytes(upperBound) As Byte
 For i As Integer = 0 To upperBound
  bytes(i) = Convert.ToByte(input.Substring(i * 2, 2), 16)
 Next

 output = uni.GetString(bytes)

 Return output
End Function
Elarys
  • 639
  • 3
  • 10
  • 20
0

Have you tried:

Public Function Decode(Byval Coded as string) as string
     Return StrConv(Coded, vbUnicode)
End Function

Also, your function is invalid. It takes s as an argument, does a load of stuff and then outputs the s that was put into it instead of the stuff that was processed within it.

Pharap
  • 3,826
  • 5
  • 37
  • 51
  • I saw that error in the question, sorry about that. I tried many versions of that script before the one posted. Anyway, unfortunately vbUnicode was dropped in .net and is for VB6 only. So that's not going to work for me. – Elarys Aug 02 '12 at 11:49
  • What about System.Text.Encoding.Convert to convert the string as a byte array? http://msdn.microsoft.com/en-us/library/system.text.encoding.convert(v=vs.71).aspx – Pharap Aug 02 '12 at 11:59
  • I managed to find a way of doing it using Heinzi's comment earlier. Using Regex and a custom match evaluator, combined with a hextostring function. I'll post the code myself once I tidy it up a bit, to share it. – Elarys Aug 02 '12 at 13:00