6

My Classic ASP application retrieves an UTF-8 string from it's database, but I need to convert it to ISO-8859-1. I can't change the HTML page encoding;

I really need to convert just the fetched string. How can I do it?

user692942
  • 16,398
  • 7
  • 76
  • 175
Metalcoder
  • 2,094
  • 3
  • 24
  • 30
  • How about searching the site this has been answered, by yours truly and others umpteen times already! Just browse the **Related** section to the right hand side of this question for example. – user692942 Mar 03 '15 at 15:15
  • possible duplicate of [convert utf-8 to iso-8859-1 in classic asp](http://stackoverflow.com/questions/17677180/convert-utf-8-to-iso-8859-1-in-classic-asp) – user692942 Mar 03 '15 at 15:16
  • @Lankymart I've seen that question, but its answer deals with the entire script file. I want to convert a string that originated in a database, so it's not even plainly stated in that file. I need to convert only a single string, and keep the rest of the file as it is (ISO-8859-1). But I am new to classic ASP, and I may not be grasping the answer correctly. Given this clarification, do you believe that it is so? If I'm wrong, I'll request for clarification on the comments of that answer. – Metalcoder Mar 03 '15 at 16:16
  • 1
    @Lankymart I've been searching for this answer for DAYS, and tried it here too. The related questions are not useful, as they deal mainly with other languages. I avoid asking exactly because people are extremely picky about new ones. There has been too many times that I asked a question at the StackExchange sites that I visit, and had to fight to keep my question open. Sorry about the rant. – Metalcoder Mar 03 '15 at 16:26
  • Did you see this one - [ASP: I can´t decode some character from utf-8 to iso-8859-1](http://stackoverflow.com/questions/21751893/asp-i-can%C2%B4t-decode-some-character-from-utf-8-to-iso-8859-1)? – user692942 Mar 03 '15 at 17:21
  • @Lankymart No, I haven't. It didnt show up in my search. The answer that I posted seems similar, but I don't know enough to say for sure – Metalcoder Mar 03 '15 at 17:23
  • I tend to find you get better results searching from [Google](https://www.google.co.uk/search?num=100&safe=off&espv=2&q=site:stackoverflow.com+convert+utf-8+iso-8859-1+asp&oq=site:stackoverflow.com+convert+utf-8+iso-8859-1+asp&gs_l=serp.3...344145.351116.0.351896.25.25.0.0.0.3.139.1623.19j5.24.0.msedr...0...1c.1.62.serp..22.3.243.jAJXRwbdzis&gws_rd=cr&ei=su71VOC_K-_X7AbnzIFI). – user692942 Mar 03 '15 at 17:26
  • 1
    I use google without the `site:stackoverflow.com`, to ge broader results. But it really seems to works better than SO search. Thanks, – Metalcoder Mar 03 '15 at 17:28

2 Answers2

12

I found the answer here:

Const adTypeBinary = 1
Const adTypeText = 2

' accept a string and convert it to Bytes array in the selected Charset
Function StringToBytes(Str,Charset)
  Dim Stream : Set Stream = Server.CreateObject("ADODB.Stream")
  Stream.Type = adTypeText
  Stream.Charset = Charset
  Stream.Open
  Stream.WriteText Str
  Stream.Flush
  Stream.Position = 0
  ' rewind stream and read Bytes
  Stream.Type = adTypeBinary
  StringToBytes= Stream.Read
  Stream.Close
  Set Stream = Nothing
End Function

' accept Bytes array and convert it to a string using the selected charset
Function BytesToString(Bytes, Charset)
  Dim Stream : Set Stream = Server.CreateObject("ADODB.Stream")
  Stream.Charset = Charset
  Stream.Type = adTypeBinary
  Stream.Open
  Stream.Write Bytes
  Stream.Flush
  Stream.Position = 0
  ' rewind stream and read text
  Stream.Type = adTypeText
  BytesToString= Stream.ReadText
  Stream.Close
  Set Stream = Nothing
End Function

' This will alter charset of a string from 1-byte charset(as windows-1252)
' to another 1-byte charset(as windows-1251)
Function AlterCharset(Str, FromCharset, ToCharset)
  Dim Bytes
  Bytes = StringToBytes(Str, FromCharset)
  AlterCharset = BytesToString(Bytes, ToCharset)
End Function

So I just did this:

AlterCharset(str, "ISO-8859-1", "UTF-8")

And it worked nicely.

Metalcoder
  • 2,094
  • 3
  • 24
  • 30
  • This will work but bear in mind if your source encoding is `UTF-8` and you convert to `ISO-8859-1` you will depending on the characters in the UTF-8 character set hit mismatches where they don't exist in `ISO-8859-1`. – user692942 Mar 03 '15 at 17:24
  • @Lankymart Since UTF-8 has a larger character set than ISO-8859-1, those mapping issues are something that I expect. Or is there another way to work around it? – Metalcoder Mar 03 '15 at 17:31
  • Not really if you are aware of it, that's half the battle. You'd be surprise how many people just expect it to work. – user692942 Mar 03 '15 at 17:32
  • I guess what I always wondering when it comes to encoding questions like this one is why do you need to convert `UTF-8` data to `ISO-8859-1`? Usually it comes down to a legacy system with lots of pages saved as `Windows-1252` for example and it's seen as a quick fix, in the long run providing proper `UTF-8` support from server to client is the way to go. – user692942 Mar 03 '15 at 17:36
  • In my case, because the database is in UTF-8, and the script files are saved in ISO-8859-1. Since the script that I am working on is part of a bigger system and I can't change the database encoding, I have to fetch the string and then convert. If you don't need accented characters to work (as is the case when the app deals only with English), then you may not need to do this, but here in Brazil we use accents all the time. – Metalcoder Mar 03 '15 at 17:42
  • I've worked with [tag:multilanguage] systems in Classic ASP for years and trust me when I say using a method like this you will hit problems, the only "real" fix is to fix the encoding in the application (encoding of files and response from Classic ASP) to allow for [tag:internationalization] by supporting `UTF-8` to provide true [tag:localization]. – user692942 Mar 03 '15 at 17:48
  • I agree with you. Unfortunately, I can't change the encoding right now, since it would make the project late, and I am under a lot of pressure to finnish it soon. I intend to change it at the next iteration. – Metalcoder Mar 03 '15 at 17:56
1

To expand on the OP's own self-answer, when converting from single-byte character sets (such as ISO-8859-1, Windows-1251, Windows-1252, etc...) to UTF-8, there is some needless redundancy in converting to and back from ADODB's byte array. The overhead of multiple function calls and conversions can be eliminated as such:

Const adTypeText = 2

Private Function AsciiStringToUTF8(AsciiString)
    Dim objStream: Set objStream = CreateObject("ADODB.Stream")
    Call objStream.Open()
    objStream.Type = adTypeText
    'Any single-byte charset should work in theory
    objStream.Charset = "Windows-1252"
    Call objStream.WriteText(AsciiString)
    '''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
    objStream.Position = 0
    objStream.Charset = "UTF-8"
    AsciiStringToUTF8 = objStream.ReadText()
    Call objStream.Close(): Set objStream = Nothing
End Function
Makaveli84
  • 453
  • 6
  • 16