5

I have this code and I still can't seem to replace non English characters like Vietnamese or Thai from my data with a simple "placeholder".

Sub NonLatin()
Dim cell As Range
    For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp))
        s = cell.Value
            For i = 1 To Len(s)
                If Mid(s, i, 1) Like "[!A-Za-z0-9@#$%^&* * ]" Then cell.Value = "placeholder"
            Next
    Next
End Sub

Appreciate your help

wilson kao
  • 51
  • 1
  • 4
  • Also wouldn't you need an i and a cell after your NEXT statements? – Luuklag Aug 07 '17 at 10:31
  • Have a look at using `RegEx` instead – Tom Aug 07 '17 at 10:40
  • 1
    @Luuklag you don't *have to* include the counter variable after the `Next` statement, it's just good practice as it increases readability. See [this question](https://stackoverflow.com/questions/21993482/vba-why-do-people-include-the-variables-name-in-a-next-statement) – Wolfie Aug 07 '17 at 10:42
  • @Wilson are you trying to replace the non-English characters with a placeholder, or change the value of the entire cell if it contains a non-English character? You may find [this article](https://www.di-mgt.com.au/howto-convert-vba-unicode-to-utf8.html) useful, which contains code to convert strings to UTF-8 characters and in-fill unknown characters with `?` – Wolfie Aug 07 '17 at 10:52
  • @Wolfie Good to know, still not too old to learn something ;) – Luuklag Aug 07 '17 at 11:30

2 Answers2

1

You can replace any chars that are out of e. g. ASCII range (first 128 chars) with placeholder using the below code:

Option Explicit

Sub Test()

    Dim oCell As Range

    With CreateObject("VBScript.RegExp")
        .Global = True
        .Pattern = "[^u0000-u00F7]"
        For Each oCell In [A1:C4]
            oCell.Value = .Replace(oCell.Value, "*")
        Next
    End With

End Sub
omegastripes
  • 12,351
  • 4
  • 45
  • 96
0

See this question for details about using Regular Expressions in your VBA code.


Then use regular expressions in a function like this one to process strings. Here I am assuming you want to replace each invalid character with a placeholder, rather than the entire string. If it's the entire string then you don't need to do individual character checks, you can simply use the + or * qualifiers for multiple characters in your Regular Expression's pattern, and test the entire string together.

Function LatinString(str As String) As String
    ' After including a reference to "Microsoft VBScript Regular Expressions 5.5"
    ' Set up the regular expressions object
    Dim regEx As New RegExp
    With regEx
        .Global = True
        .MultiLine = True
        .IgnoreCase = False
        ' This is the pattern of ALLOWED characters. 
        ' Note that special characters should be escaped using a slash e.g. \$ not $
        .Pattern = "[A-Za-z0-9]"
    End With

    ' Loop through characters in string. Replace disallowed characters with "?"
    Dim i As Long
    For i = 1 To Len(str)
        If Not regEx.Test(Mid(str, i, 1)) Then
            str = Left(str, i - 1) & "?" & Mid(str, i + 1)
        End If
    Next i
    ' Return output
    LatinString = str
End Function

You can use this in your code by

Dim cell As Range
For Each cell In Range("A1", Cells(Rows.Count, "A").End(xlUp))
    cell.Value = LatinString(cell.Value)
Next

For a byte-level method which converts a Unicode string to a UTF8 string, without using Regular Expressions, check out this article

Wolfie
  • 27,562
  • 7
  • 28
  • 55
  • Why not ignore case and use a simpler expression? – Tom Aug 07 '17 at 12:32
  • You could well do that @Tom, I was keeping the example as similar as possible to [a simplified version of] the OP's pattern, and the example given in the linked question. It would be even neater to leave out the line I included as `IgnoreCase = False` is the default - I was just showing some options! :) – Wolfie Aug 07 '17 at 13:08