Questions tagged [character-reference]

A character reference is the "...;" notation used to reference characters that have special meaning, or are otherwise problematic to use, in markup languages such as HTML, XML, SGML and similar. Use this tag if character references are a central topic of a question, combined with the language you are using (eg. HTML, and so on).

16 questions
11
votes
2 answers

Validation Failed: "EntityRef: expecting ';'"

Hi I've got some XML that won't validate. I've narrowed down the problem to this bit:
Laxmidi
  • 2,650
  • 12
  • 49
  • 81
5
votes
2 answers

How can I convert HTML character references (ף) to regular UTF-8?

I have some hebrew websites that contains character references like: נוף I can only view these letters if I save the file as .html and view in UTF-8 encoding. If I try to open it as a regular text file then UTF-8 encoding does not…
ufk
  • 30,912
  • 70
  • 235
  • 386
4
votes
2 answers

Spec justification for € to Ÿ in UTF-8 documents browser behaviour wanted

The HTML 4.01 spec says for hexadecimal character references Numeric character references specify the code position of a character in the document character set. So if the document character set encoding is UTF-8, the numeric references should…
Alohci
  • 78,296
  • 16
  • 112
  • 156
4
votes
0 answers

Python xml: encode numeric character references in hex form

I have a number of scripts which get external data and update parts of xml files. I use lxml in my python script and it saves character references in decimal notation, for example: $ cat input.xml
Fedor Dikarev
  • 506
  • 3
  • 9
4
votes
2 answers

Not allowed decimal numeric character reference: forbidden or text?

According to HTML 5.1 spec :: Decimal numeric character reference: The ampersand must be followed by a "#" (U+0023) character, followed by one or more ASCII digits, representing a base-ten integer that corresponds to a Unicode code point that…
user1180790
3
votes
3 answers

‘ “ é == ‘ “ é, but on what encoding/reference?

I have a ColdFusion script that does: Which replaces &147; by ". Google understands this too, if you type ‘ “ é at its search box its transformed on the results page to ‘ “…
inerte
  • 1,134
  • 1
  • 8
  • 20
2
votes
1 answer

How to generate real UTF-8 XML with grails without the escape characters?

I have been wondering why when I set the encoding to UTF-8 and rendering the XML it replace the extended characters by escape characters (or character reference) like ’ instead of '? I'm using the Render method render(contentType:"text/xml",…
Sauleil
  • 2,573
  • 1
  • 24
  • 27
2
votes
1 answer

HTML character reference display problems

I'm currently developing a site in Joomla, and one of the components I'm using makes use of a PHP file to administer the language. (english.php, spanish.php) The problem I'm having is that if I use the plain text version of eg. "á", it will show up…
Bren
  • 51
  • 5
1
vote
1 answer

How to test for character references in Symfony with PHPUnit?

I want to test this very simple page generated by my PHP/Symfony project
Simple ! Tranquille ! Excellent !
(It's in French, so it needs the   hard spaces in front of the exclamation points.) I thought an…
Jean-David Lanz
  • 865
  • 9
  • 18
1
vote
2 answers

How to escape strings with numeric character references in Java

Hello and thank you for reading my post. The Apache Commons StringEscapeUtils.escapeHtml3() and StringEscapeUtils.escapeHtml4() functions allow, in particular, to convert characters with an acute (like é, à...) in a string into character entity…
1
vote
2 answers

Javascript String argument with character reference

I have a javascript method call with a string parameter. In the string text sometimes contains html character references, e.g. ' I am getting an unexpected identifier error. If I have the character reference as " then it works fine. Not…
Eqbal
  • 4,722
  • 12
  • 38
  • 47
0
votes
1 answer

lxml - keep input symbols, disable entity conversion

If the following string is read and output using lxml, the umlauts are converted to entities. import xml.etree.ElementTree as ET root = ET.fromstring("Die Häuser haben Dächer.") as_text =…
user3033490
0
votes
1 answer

Is there a way to have an XmlReader preserve a character reference as text rather than converting it?

I'm using an xml reader to parse some xml and I'm wondering if I can have it read in a character entity reference as straight text rather than converting it to the actual character. So if I called ReadInnerXml() on the node:
Josh
  • 37
  • 5
0
votes
5 answers

How to get the length of a string containing character references while counting the character references as one single character?

How can I get the length of string that also contains character references? I want to count only the number of characters which will be displayed in the browser. Like $raw = "Stack�f9" = Length = 6 $raw = "Stack12345" = Length = 10 $raw…
Novice
  • 981
  • 6
  • 12
  • 25
0
votes
2 answers

Java SAX parser, How do I prevent character references entirely? (DoS attack)

The XML files of incoming request needs to be validated. One requierement is that character references are prevented entirely because of possible DoS attacks. If I configure the SAXParserFactory like below: SAXParserFactory spf =…
My-Name-Is
  • 4,814
  • 10
  • 44
  • 84
1
2