5

I have a access table and i am writing a vba code to remove non-ascii characters from the table, i have tried using below two functions

Public Function removeall(stringData As String) As String
    Dim letter As Integer
    Dim final As String
    Dim i As Integer

    For i = 1 To Len(stringData) 'loop thru each char in stringData

        letter = Asc(Mid(stringData, i, 1)) 'find the char and assign asc value

        Select Case letter 'Determine what type of char it is
            Case Is < 91 And letter > 64 'is an upper case char
                final = final & Chr(letter)
            Case Is < 123 And letter > 96 'is an lower case char
                final = final & Chr(letter)
            Case Is = 32  'is a space
                final = final & Chr(letter)
    End Select

    Next i
    removeall = final

    End Function

And also tried using below function

Public Function Clean(InString As String) As String
'-- Returns only printable characters from InString
   Dim x As Integer
   For x = 1 To Len(InString)
      If Asc(Mid(InString, x, 1)) > 31 And Asc(Mid(InString, x, 1)) < 127 Then
         Clean = Clean & Mid(InString, x, 1)
      End If
   Next x

End Function

But the problem is : In removeall function it removes everything including # and space characters.. And In Clean function also removes special characters as well.

I need a correct function which retains key board characters and removes all other characters

Examples of strings in tables are :

1) "ATTACHMENT FEEDING TUBE FITS 5-18 ºFR# "

2) "CATHETER FOLEY 3WAY SILI ELAST 20FR 30ML LATEXº"

Any help would be greatly appreciated

Output should be like

1) "ATTACHMENT FEEDING TUBE FITS 5-18 FR"

2) "CATHETER FOLEY 3WAY SILI ELAST 20FR 30ML LATEX"

natwar lal
  • 331
  • 3
  • 10
  • "function which retains key board characters" - `#` is both an ASCII character and a 'keyboard character' (whatever that means - I can certainly see it on my keyboard), so why do you want it removed? What do you actually want? – AakashM Nov 27 '17 at 09:11

2 Answers2

4

One approach would be to use a whitelist of accepted characters. e.g.

' You can set up your domain specific list:
Const Whitelist = "1234567890" & _
                  "qwertyuiopasdfghjklzxcvbnm" & _
                  "QWERTYUIOPASDFGHJKLZXCVBNM" & _
                  " `~!@#$%^&*()_-=+[]{};:""'|\<>?/ –"

Public Sub test()

    Debug.Print Clean("ATTACHMENT FEEDING TUBE FITS 5-18 ºFR#")
    Debug.Print Clean("CATHETER FOLEY 3WAY SILI ELAST 20FR 30ML LATEXº")

End Sub

Public Function isAllowed(char As String) As Boolean

    isAllowed = InStr(1, Whitelist, char, vbBinaryCompare) > 0

End Function


Public Function Clean(dirty As String) As String
'-- Returns only printable characters from dirty
   Dim x As Integer
   Dim c As String
   For x = 1 To Len(dirty)
      c = Mid(dirty, x, 1)
      If isAllowed(c) Then
         Clean = Clean & c
      End If
   Next x

End Function
Spangen
  • 4,420
  • 5
  • 37
  • 42
4

Alternate approach that preserves ALL ASCII characters, without working with a whitelist, in a single function:

Public Function RemoveNonASCII(str As String) As String
    Dim i As Integer
    For i = 1 To Len(str)
        If AscW(Mid(str, i, 1)) < 127 Then 'It's an ASCII character
            RemoveNonASCII = RemoveNonASCII & Mid(str, i, 1) 'Append it
        End If
    Next i
End Function
Erik A
  • 31,639
  • 12
  • 42
  • 67
  • It was quite hard to understand from the Op what was actually required. He kept talking about ascii *and* special characters. Hence the white list as an extendable solution above – Spangen Nov 28 '17 at 08:12
  • Yes, I understand your approach. But for future reference (anyone coming here actually seeking to remove non-ASCII characters from a string, by googling for example), mine is simpler and does not require a list of all ASCII characters. It does not remove the `#`, of course, since that's a valid ASCII character – Erik A Nov 28 '17 at 08:15
  • I agree yours is simpler and is what I would have gone for if the OP had simply asked for the retention of ascii. However he also asked to retain keyboard characters. The pound sterling symbol (U+00A3) is on the keyboard but is not ascii. – Spangen Nov 28 '17 at 08:18
  • I don't see how the pound sterling symbol is relevant, `#` is an octothorpe, ASCII character 35. My intention is not to only help OP, but mostly to provide a useful reference – Erik A Nov 28 '17 at 08:23