0

I'm rewriting one of our forms so that it can handle international domains, but before I start on that i thought i would test the capabilities of handling IDNs.

My thoughts were, should someone enter a domain like: http://清华大学.cn i would store it as the punycode in my database. With this in mind I then found the Java IDN handler and wrote this bit of test code:

<cfset strUrl = "http://清华大学.cn" />
<cfoutput>
#strURL#
</cfoutput>     
<!--- Create a Java URL. --->
<cfset jUrl = CreateObject( "java", "java.net.IDN" ).toAscii(strUrl) />

<cfoutput>
#jURL#
</cfoutput>

However, upon running this I get an error:

A prohibited code point was found in the inputhttp://æ ̧...å�žå¤§å¦

This occurs on the .toAscii line. What have i missed?

Jarede
  • 3,310
  • 4
  • 44
  • 68
  • Wouldn't Ascii de facto prohibit these higher character sets? Isn't there a "toUtf8()" method or something? – Mark A Kruger May 10 '12 at 15:00
  • I'm not sure what you're saying. Are you suggesting I need to convert the string before I convert it to ASCII? – Jarede May 10 '12 at 15:09
  • http://stackoverflow.com/questions/1510794/whats-the-proper-technical-term-for-high-ascii-characters – Sharondio May 10 '12 at 15:15
  • I'm saying that a sting with character sets above 512 (the extended ASCII set) are not convertible to ASCII at all... ASCII is by it's nature a limited western character set. To convert the "international" equivalent to ASCII is generally UTF-8 (which has 50,000 placeholders instead of just 512). Does that make sense? – Mark A Kruger May 10 '12 at 15:18

1 Answers1

0

I added <cfprocessingdirective pageencoding="utf-8"> to the top of my page which now encodes and decodes the chinese correctly using the Java IDN.

I found this solution here in the comments section:

http://www.bennadel.com/blog/1155-Cleaning-High-Ascii-Values-For-Web-Safeness-In-ColdFusion.htm

Jarede
  • 3,310
  • 4
  • 44
  • 68