0

I'm trying to have a function which takes a string and returns the same string without any accented letters. Instead, the accented letters should return the same letter without the accent. This function is not working:

function StripAccents(str)

accent   = "ÈÉÊËÛÙÏÎÀÂÔÖÇèéêëûùïîàâôöç"
noaccent = "EEEEUUIIAAOOCeeeeuuiiaaooc"

currentChar = ""
result = ""
k = 0
o = 0

FOR k = 1 TO len(str)
    currentChar = mid(str,k, 1)
    o = InStr(accent, currentChar)
    IF o > 0 THEN
        result = result & mid(noaccent,k,1)
    ELSE
        result = result & currentChar
    END IF
NEXT

StripAccents = result

End function

testStr = "Test : à é À É ç"
response.write(StripAccents(testStr))

This is the result using the above:

Test : E E Eu EE E
greener
  • 4,989
  • 13
  • 52
  • 93
  • Accents are a subset of what you're transforming. The correct term is [diacritic](http://en.wikipedia.org/wiki/Diacritic). – Tom Blodget Jul 17 '14 at 04:05

3 Answers3

2

Disregarding possible encoding problems - you must change

result = result & mid(noaccent,k,1)

to

result = result & mid(noaccent,o,1)
Ekkehard.Horner
  • 38,498
  • 2
  • 45
  • 96
  • Thanks. That was silly. Although I do seem to have encoding problems also.. I'm using UTF-8. Is that not correct? – greener Jul 16 '14 at 19:32
  • @greener - as you can't "use UTF-8" in VBScript, you need to compose a new question that describes your "encoding problems" and what you did to cause them. – Ekkehard.Horner Jul 16 '14 at 19:38
1

I tried the example code with the correction added

Then I added more characters

Giving:

accent   = "àèìòùÀÈÌÒÙäëïöüÄËÏÖÜâêîôûÂÊÎÔÛáéíóúÁÉÍÓÚðÐýÝãñõÃÑÕšŠžŽçÇåÅøØ"
noaccent = "aeiouAEIOUaeiouAEIOUaeiouAEIOUaeiouAEIOUdDyYanoANOsSzZcCaAoO"

Now I realised that there are a few more to deal with, namely

æ
Æ
ß

These need converting first using a simple replace them with ae AE and ss

Then it works fine other than it is important to not have <%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%> or similar in the code

However having meta charset="UTF-8" in the header is not a big issue, it converts fine.

So if the code is needed on the page with <%@LANGUAGE="VBSCRIPT" CODEPAGE="65001"%> in it, I do not know any answer to that

Thanks for the code greener, very useful for dealing with the common diacriticals :-)

Viktor
  • 2,623
  • 3
  • 19
  • 28
Phil Allen
  • 153
  • 2
  • 7
0

You should probably do a decomposition normalization first (NFD). I think you could do this in VBA using a call to the WinAPI function NormalizeString (https://msdn.microsoft.com/en-us/library/windows/desktop/dd319093(v=vs.85).aspx). Then, you could remove the accent code points.

someprogrammer
  • 229
  • 2
  • 13