6

How do I remove all special characters which don't fall under ASCII category in VBA?

These are some of the symbols which appear in my string.

Œ œ Š š Ÿ ƒ

There are many more such characters.

These don't belong to ASCII category as you can see here http://www.ascii.cl/htmlcodes.htm

I tried something like this

strName = Replace(strName, ChrW(376), " ")
Community
  • 1
  • 1
Jeevan
  • 447
  • 2
  • 8
  • 19
  • Terminology: As you know, Unicode is a superset of ASCII. However, the ASCII characters are in what is called a "block" not a [Unicode category](http://www.fileformat.info/info/unicode/category/index.htm); specifically the [C0 Controls and Basic Latin](http://unicode.org/charts/nameslist/) block. – Tom Blodget May 04 '16 at 16:19
  • FYI The ChrW CharCode argument is a `Long` that identifies a character, but doesn't allow values greater than `65535` (hex value `&HFFFF`). It might be of some practical interest to find systematic ways to overcome this restriction - c.f. six solutions at [Get UniCode characters with CharCode greater hex FFFF](https://stackoverflow.com/questions/56008191/get-unicode-characters-with-charcode-values-greater-hex-ffff), e.g. in Unicode block `1F300-1F5FF` "Miscellaneous symbols and pictographs" :-) – T.M. Nov 22 '19 at 15:17

6 Answers6

8

Would a RegEx solution be of interest to you?

There are plenty of examples for different languages on this site - here's a C# one: How can you strip non-ASCII characters from a string? (in C#).

Try this for VBA:

Private Function GetStrippedText(txt As String) As String
    Dim regEx As Object

    Set regEx = CreateObject("vbscript.regexp")
    regEx.Pattern = "[^\u0000-\u007F]"
    GetStrippedText = regEx.Replace(txt, "")

End Function
Community
  • 1
  • 1
Ambie
  • 4,872
  • 2
  • 12
  • 26
  • 1
    Hey Thanks a lot, it works. Not sure of all conditions and all special Unicode characters, but for now it works well. It just takes too much time for processing. – Jeevan May 04 '16 at 10:32
  • 1
    If you're converting many strings eg in a loop, then try defining the pattern once and just calling the `.Replace` function inside your loop. That might help with speed. – Ambie May 04 '16 at 10:37
  • Thank you :) That did not reduce the time much but anyway it helped to reduce a little bit of time. – Jeevan May 04 '16 at 10:41
  • Does this cover all the unicode or just 0 to 7F? Will this work for all boundary conditions? Please let me know. – Jeevan May 04 '16 at 17:32
  • It will replace any character not in the Unicode range 0 to 127. – Ambie May 06 '16 at 11:20
  • Need to add this option in: regEx.Global = True – Frank Sep 20 '20 at 19:35
  • Only remove one Unicode character, not all of them in a string. – joehua Nov 19 '20 at 23:45
3

Try with below

Function ClearUnwantedString(fulltext As String) As String
    Dim output As String
    Dim character As String
    For i = 1 To Len(fulltext)
        character = Mid(fulltext, i, 1)
        If (character >= "a" And character <= "z") Or (character >= "0" And character <= "9") Or (character >= "A" And character <= "Z") Then
            output = output & character
        End If
    Next
    ClearUnwantedString = output
End Function

Sub test()
    a = ClearUnwantedString("dfjŒœŠdskl")
End Sub
Karthick Gunasekaran
  • 2,697
  • 1
  • 15
  • 25
  • I am using something similar, but the moto of my code is not this. It should allow special character which are ASCII, but only remove special character which doesn't come under ASCII code of 0 to 255. – Jeevan May 04 '16 at 10:11
  • My String looks like LPAIF in excel sheet, but there is a Ÿ char hidden in front which is not visible in excel, but it exist, so the string is ŸLPAIF. Some solutions given above worked when I gave "ŸLPAIF" as a static string, but not when read from excel directly. Anyway we have a solution now, thanks for trying to help. – Jeevan May 04 '16 at 10:35
  • I tried hard removing strange characters from my string in VBA and this code was very helpfull for me – rustyBucketBay Sep 10 '19 at 21:07
  • It removes all unicode characters in a string. However, it also removes spaces. – joehua Nov 19 '20 at 23:46
3

Don't need to loop each character

Maybe late, but maybe it helps someone:

Public Function StripNonAsciiChars(ByVal InputString As String) As String
    Dim i As Integer
    Dim RegEx As Object
    Set RegEx = CreateObject("VBScript.RegExp")
    With RegEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = "[^\u0000-\u007F]"
        StripNonAsciiChars = Application.WorksheetFunction.Trim(RegEx.Replace(InputString, " "))
    End With
End Function
Sorin GFS
  • 477
  • 3
  • 18
  • wow... Thanks. it work for me without Application.WorksheetFunction.Trim. Worksheet function give me error. – Mabaega Sep 05 '20 at 10:57
  • @Mabaega Ofc, you may play with the `Replace` function too, e.g. you may put dashes, or leave the spaces, this is up to your needs. Lately, starting with the ClickToRun aproatch of MS Office gives 1004 error for the base excel functions in VBA... – Sorin GFS Sep 05 '20 at 14:20
1

What do you get when you write the following in the immediate window?

?Replace("ŸŸŸŸ", ChrW(376), "ale")

I get: alealealeale

Vityata
  • 42,633
  • 8
  • 55
  • 100
  • Yes It prints the same for me. I am having Ÿ as special character, but when passed as a string inside Replace method strName = Replace(strName, ChrW(376), " "), where strName is initially ŸLPAIF becomes ?LPAIF. This string then goes to a write to file method, where the code crashes saying "Run-time error '5' Invalid procedure call or argument". I think that ? newly produced is also not an ordinary question mark, but some special character. I want to replace it in my dynamic string, not in a static one as "ŸŸŸŸ" you shown. – Jeevan May 04 '16 at 10:09
  • Hi, I think it is working in Debug.Print but when using actual value from cell where Ÿ is present, it doesn't do that. – Jeevan May 04 '16 at 10:20
0

try application.clean()

it will remove all unprintable characters

Rosetta
  • 2,665
  • 1
  • 13
  • 29
  • Can you please give me an example for it. I am not able to find any application.clean() in the autotext code in vba. – Jeevan May 04 '16 at 09:51
  • Please take a look at this http://pasteboard.co/FtkoMrB.png I also have tried "strName = Clean(strName)" and they did not work. – Jeevan May 04 '16 at 09:55
  • Try with Application.WorksheetFunction.Clean("üäöaŠsd"). But it will not help you, it is only for non-printable characters. And yours are printable. – Vityata May 04 '16 at 09:59
  • @Vityata You are right. This is just a test to see if you are paying attention xD :P But @Jeevan `.clean` exist under `worksheetfunction` but it can also be called under `application`. I'll just leave the answer here for reference. – Rosetta May 04 '16 at 11:29
  • I did not get exactly what you mean. The "application.clean" does not show with Intellisense, but with "application.worksheetfunction.clean" it is there. – Vityata May 04 '16 at 11:33
  • see this http://answers.microsoft.com/en-us/office/forum/office_2003-customize/applicationxlfunc-v-worksheetfunctionxlfunc/2f05c80c-7e2e-4a4f-a638-cb0d03d9a234?auth=1 – Rosetta May 04 '16 at 11:42
0

Supposing you have:

enter image description here

Then the following code will get the String from A1 and let through only ANSI (code 0 to 255) in A2.

Sub test()
 Dim s1 As String, s2 As String, c As String, i As Long, iAsc As Integer

 s1 = Range("A1").Value

 s2 = ""

 For i = 1 To Len(s1)
  c = Mid(s1, i, 1)
  iAsc = AscW(c)
  If iAsc <= 255 Then
   s2 = s2 & c
  End If
 Next

 Range("A2").Value = s2

End Sub
Axel Richter
  • 56,077
  • 6
  • 60
  • 87
  • Hi, I tried this code, and it replaced Ÿ with a ? in the output. Which still doesn't solve the issue. – Jeevan May 04 '16 at 10:21
  • I think while reading itself Ÿ is read as ? by excel. Ÿ is anyway invisible in excel sheet. – Jeevan May 04 '16 at 10:25
  • As you see in my picture, the "This Œ is œ a Š testš . Ÿ ƒ Blubb." is actual content of my sheet. With this my code works exactly as I have described it. Where do you see the "Ÿ" which is then **not** visible in the sheet? – Axel Richter May 04 '16 at 10:31
  • I see Ÿ when I copy paste that string from the excel cell to a notepad. Thanks for trying to help. We have a solution now Ambie. Thanks for your efforts. – Jeevan May 04 '16 at 10:37