3

I'm trying to display a character stored in a database that is the unicode character \u0096. Because of a strange windows-vs-web-browser thing this is a control character in the unicode standard, but web-pages will display it as an En Dash. See @AlanMoore's answer on Some UTF-8 characters do not show up on browser.

I have the following jsp file. I want to display the \u0096 character as an En Dash(A feat that other front-end solutions can accomplish).

<%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>
<%@ page session="false" trimDirectiveWhitespaces="true"%>
<%@ taglib prefix="c" uri="http://java.sun.com/jsp/jstl/core"%>
<!doctype html>
<html>

<c:set var="control" scope="request" value= "b"/>
<c:set var="endash" scope="request" value="a"/>
<% request.setAttribute("control", "\u0096");%>
<% request.setAttribute("endash", "\u2013");%>

Match? 0096: <c:out value="${control}"/> 2013: <c:out value="${endash}"/>

The output that I get is

Match? 0096:  2013: –

What I want is

Match? 0096: – 2013: –
Community
  • 1
  • 1
MrBrightside
  • 593
  • 2
  • 10
  • 22

1 Answers1

7

The character denoted by \0096, i.e. U+0096, is unambiguously a control character in Unicode, with undefined meaning. This should not be confused with the fact that in the windows-1252 encoding, the byte 0x96 denotes U+2013 EN DASH.

Thus, instead of trying to render an invisible character as visible, you should simply replace U+0096 by U+2013 or, depending on the actual setup, perhaps convert the data you get from the database, converting from windows-1252 to e.g. UTF-16. It is unlikely that the database contains something meant to be U+0096. Rather, it contains bytes that are now being misinterpreted as UTF-16 but are actually windows-1252 encoded representations of characters.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390