1

In ancient time, we can specify all characters with chr(56)

For example, say the character is unprintable. We want to put it in a string. Just do

Dim a as string = chr (56)

Now we have UTF8 or unicode (or whatever encoding).

Say I want variable a to contain

             en space
             em space
           thin space
‌ ‌    ‌      zero width non-joiner
‍ ‍    ‍       zero width joiner
‎ ‎    ‎       left-to-right mark
 ‏    ‏       right-to-left mark

In fact, say I want to create a function that'll get rid all of such characters from my string.

How would I do so?

I want the function to leave chinese, korean, japanese characters intact and then get rid really really vague ones.

Steven Doggart
  • 43,358
  • 8
  • 68
  • 105
user4951
  • 32,206
  • 53
  • 172
  • 282

3 Answers3

1
''' <summary>
''' This function replaces 'smart quotes' (ASC 145, 146, 147, 148, 150) with their correct ASCII versions (ASC 39, 34, 45), and replaces any other non-ASCII characters with "?"
''' </summary>
''' <param name="expression"></param>
''' <returns></returns>
''' <remarks></remarks>
Public Function Unicode2ASCII(ByVal expression As String) As String
  Dim sb As New System.Text.StringBuilder
  For i As Integer = 1 To Len(expression)
    Dim s As String = Mid(expression, i, 1)
    Select Case Asc(s)
      Case 145, 146 'apostrophes'
        sb.Append("'"c)
      Case 147, 148 'inverted commas'
        sb.Append(""""c)
      Case 150 'hyphen'
        sb.Append("-"c)
      Case Is > 127
        sb.Append("?"c)
      Case Else
        sb.Append(s)
    End Select
  Next i
  Return sb.ToString
End Function

Or to add them...

Dim s As String = "a" & ChrW(8194) & "b"
MsgBox(s)
SSS
  • 4,807
  • 1
  • 23
  • 44
  • I really do not think this will work. All you do is just look at the asc. We're talking about far more special characters than just these. – user4951 May 23 '12 at 04:36
  • Actually, if you change to AscW() you can strip out or replace the characters you want. Unless you are talking about adding them? In which case use Char.ConvertFromUtf32() or ChrW() – SSS May 23 '12 at 04:45
1

Replace removes whatever you want. ChrW produces Unicode characters by code (to produce characters outside Unicode Plane 0 you need to concatenate 2 Char).

Something like:

Replace("My text", ChrW(8194), "");
Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • Are you sure? I thought unicode contains far more than 65k characters and ChrW simply handle around 65k character – user4951 May 23 '12 at 04:48
  • String are UTF-16, if you need other Unicode characters outside Plane 0 you just need to concat 2 Char to form whole Unicode character - check this http://stackoverflow.com/questions/697055/c-sharp-and-utf-16-characters and description of planes at http://en.wikipedia.org/wiki/Plane_%28Unicode%29 – Alexei Levenkov May 23 '12 at 16:45
0

It seems like there ought to be a better way, but the best I can come up with that would work in all situations would be something like this:

Private Function getString(ByVal xmlCharacterCode As String) As String
    Dim doc As XmlDocument = New XmlDocument()
    doc.LoadXml("<?xml version=""1.0"" encoding=""utf-8""?><test>" + xmlCharacterCode + "</test>")
    Return doc.InnerText
End Function

And then use it like this:

myString = myString.Replace(getString("&#8194;"), "")

Also, you may want to take a look at this page I found:

Easy way to convert &#XXXX; from HTML to UTF-8 xml either programmaticaly in .Net or using tools

Community
  • 1
  • 1
Steven Doggart
  • 43,358
  • 8
  • 68
  • 105