1

My users copy and paste arabic text from an arabic newspaper into a textarea. I'd like to be able to store the arabic in terms of char codes such as & # 1500 ; & # 1501; and so on. How do I do that?

When I use the following snippet, I end up getting wrong numbers... First of all, each char I convert to number ends up as 3 digit, whereas I know Arabic char code entities are 4 digits.

IncomingArabic = request("IncomingArabic") 
MaxLen = Len(IncomingArabic)  
For i = 1 To MaxLen
    curChar = Mid(IncomingArabic, lLoop, 1)
    ''# curChar is an arabic char
    iChr = Asc(curChar)  ''# this gives me a 3 digit! And when I tried HEX(curChar) here, it gave a type mismatch error. 

    Encoded = Encoded & "&#" & iChr & ";"
Next
Response.write Encoded ''# shows gibberish! 
AnthonyWJones
  • 187,081
  • 35
  • 232
  • 306
Average Joe
  • 4,521
  • 9
  • 53
  • 81

2 Answers2

1

Here is what I would. Switch everything to use UTF-8. Make sure that the page posting the form is sent with Response.CharSet = "UTF-8" and its Response.CodePage = 65001. Do they same to the receiving page. Now you need not do any mucking about no matter what language is being used.

AnthonyWJones
  • 187,081
  • 35
  • 232
  • 306
0

Well, I sorted out. Just use the Arabize function I placed below.

''# example usage
response.write Arabize(request("IncomingArabic")) //gives you the correct 4 digit sequence!  


Function Arabize(Str)
  Dim Bytes
  dim FromCharset, ToCharset
  FromCharset = "windows-1256"
  ToCharset = "windows-1256"
  Bytes = StringToBytes(Str, FromCharset)
  dim temp
  temp = BytesToString(Bytes, ToCharset)
  Arabize = server.htmlencode(temp)

End Function 

''# you are gonna need the rest too... 
Const adTypeBinary = 1
Const adTypeText = 2

''#  accept a string and convert it to Bytes array in the selected Charset
Function StringToBytes(Str,Charset)
  Dim Stream : Set Stream = Server.CreateObject("ADODB.Stream")
  Stream.Type = adTypeText
  Stream.Charset = Charset
  Stream.Open
  Stream.WriteText Str
  Stream.Flush
  Stream.Position = 0
  ''# rewind stream and read Bytes
  Stream.Type = adTypeBinary
  StringToBytes= Stream.Read
  Stream.Close
  Set Stream = Nothing
End Function

''# accept Bytes array and convert it to a string using the selected charset
Function BytesToString(Bytes, Charset)
  Dim Stream : Set Stream = Server.CreateObject("ADODB.Stream")
  Stream.Charset = Charset
  Stream.Type = adTypeBinary
  Stream.Open
  Stream.Write Bytes
  Stream.Flush
  Stream.Position = 0
  ''# rewind stream and read text
  Stream.Type = adTypeText
  BytesToString= Stream.ReadText
  Stream.Close
  Set Stream = Nothing
End Function
AnthonyWJones
  • 187,081
  • 35
  • 232
  • 306
Average Joe
  • 4,521
  • 9
  • 53
  • 81
  • If I were to do: `y = BytesToString(StringToBytes(x, "Windows-1256"), "Windows-1256")` why would `x` not be identical to `y`? If they are identical why are you not simply doing: `response.write Server.HtmlEncode(Request.Form("IncomingArabic"))` ? – AnthonyWJones Mar 09 '12 at 12:13
  • Weirdest thing happened. next day, the gibberish was back again. i tried everyhting ( including your suggestions as well ). Then I started cutting code to find where the cuplrit was. And it shocked me when I found out cause it was on the least expected place. The problem went away when I remove the meta tag that sets the utf-8! Unbelievable. It totally works now. The working solution is as simple as it can get cause now, there is no utf-8 on html header, no setting the code page and charset on the asp side. How the heck utf-8 turned out to be the cause of my 2-day-hassle? That I don't know. – Average Joe Mar 11 '12 at 14:20
  • Perhaps my answer to this question: http://stackoverflow.com/questions/916118/classic-asp-how-to-convert-string-to-utf8-to-usc2 might provide some insight. – AnthonyWJones Mar 11 '12 at 16:01
  • I read your fantastic answer and voted up. Why do you think the whole thing starts working fine when I removed the utf8 and all the response.charset/and codepage stuff? – Average Joe Mar 12 '12 at 03:36
  • There are too many factors that are missing from your question to be able to say for sure. The key thing is to ensure that the response codepage is set to 65001 __before__ any access of form values and to ensure that the sending form is sending in UTF-8 encoding. What browsers are you testing with? – AnthonyWJones Mar 13 '12 at 09:04