Highest Voted 'noncharacter' Questions

102

votes

3 answers

Really Good, Bad UTF-8 example test data

So we have the XSS cheat sheet to test our XSS filtering - but other than an example benign page I can't find any evil or malformed test data to make sure that my UTF-8 code can handle missbehaving data. Where can I find some good uh.. bad data to…

unicode utf-8 noncharacter

asked Aug 23 '09 at 17:06

Xeoncross

55,620
80
262
364

56

votes

2 answers

What's the purpose of the noncharacters U+FDD0 to U+FDEF?

U+FFFE needs to be a noncharacter in order to allow the Byte Order Mark to work. U+FFFF is described in The Unicode Standard as "useful for internal purposes as sentinels". Makes sense. But I can't figure out, and The Unicode Standard doesn't…

unicode noncharacter

asked Mar 04 '11 at 01:27

dan04

87,747
23
163
198

26

votes

4 answers

Can a valid Unicode string contain FFFF? Is Java/CharacterIterator broken?

Here's an excerpt from java.text.CharacterIterator documentation: This interface defines a protocol for bidirectional iteration over text. The iterator iterates over a bounded sequence of characters. [...] The methods previous() and next() are…

java string unicode noncharacter

asked Aug 14 '10 at 09:03

polygenelubricants

376,812
128
561
623

3

votes

1 answer

What is a "noncharacter" in unicode?

I don't know what "noncharacter" characters are. They are forbidden unicode characters, though I can copy and paste them, like U+FFFF (). If a character has a fixed position in Unicode, and can be used to display something, then: Why are those…

unicode noncharacter

asked Mar 29 '21 at 20:48

InfiniteUniverse

31
1

1

vote

4 answers

Unicode Noncharacters

Is there a good resource for finding the last two characters of each plane, particularly planes 3–13? Obviously 0xFFFE and 0xFFFF is a non character, as well as 0x10FFFE and 0x10FFFF, but I can't find a complete list as to where the last characters…

unicode noncharacter

asked Feb 08 '17 at 14:15

Joe Caraccio

1,899
3
24
41

1

vote

0 answers

Detecting non-character Unicode characters

I'm working on an application that eventually reads and prints arbitrary and untrustable Unicode characters to the screen. There are a number of ways to wreck havoc using Unicode strings, and I would like my program to behave correctly for…

swift unicode noncharacter

asked Jun 22 '15 at 21:16

zneak

134,922
42
253
328

1

vote

1 answer

Why are certain characters prohibited in the HTML5 spec?

According to the HTML5 spec (just after the table), the following characters are prohibited: Otherwise, return a character token for the Unicode character whose code point is that number. Additionally, if the number is in the range 0x0001 to…

html unicode specifications noncharacter

asked Apr 09 '15 at 10:31

Daniel Fath

16,453
7
47
82

0

votes

1 answer

How can I get a 'Group Seperator', 0x1D, character from Text box or Rich Textbox or ETC. C#

I use a USB 2D barcode scanner scan a GS1 Datamatix to key in the barcode text via USB to a computer like a keyboard. The text uses 'Group Seperator', 0x1D, character as a delimiter. When I put cursor in a Hex/Text editor then scan, the 'Group…

c# noncharacter

asked Apr 19 '21 at 09:03

Kritsada Tattanon

111
2
5

0

votes

1 answer

How do I go about converting text with control characters to properly formatted text in Intellij

I'm trying to take some text that's in a format where all the spacing, tabs, newlines (control-characters - NPCs) are present. And have it output in a file in Intellij as those control characters would dictate they be formatted. I may be going about…

regex intellij-idea replace escaping noncharacter

asked Mar 05 '20 at 21:17

RatavaWen

147
1
8

0

votes

1 answer

Is this Google Closure UTF-8 string valid?

In the Google Closure UTF-8 to byte array tests is the string \u0000\u007F\u0080\u07FF\u0800\uFFFF which is supposed to be converted to the array [0x00, 0x7F, 0xC2, 0x80, 0xDF, 0xBF, 0xE0, 0xA0, 0x80, 0xEF, 0xBF, 0xBF] I've tried a few other…

javascript typescript utf-8 noncharacter

asked Aug 25 '18 at 02:47

James McLachlan

1,368
13
27

0

votes

1 answer

Strip invalid and noncharacters from utf8

I'm loading some data, processing it, then sending data to an application which (fair enough) doesn't allow the invalid utf8 noncharacters U+FDD0 through U+FDEF, as well as the invalid U+FFFE and U+FFFF special characters. My raw data is out of my…

python utf-8 noncharacter

asked Nov 16 '17 at 00:51

Amedee d'Aboville

95
7

0

votes

1 answer

Which unicode code can be used safely as reserved value?

Background I am writing a DFA based regex parser, for performance reasons, I need to use a dictionary [Unicode.Scalar : State] to map the next states. Now I need a bunch of special unicode values to represent special character expressions like .,…

swift unicode noncharacter

asked Aug 03 '17 at 00:35

dawnstar

507
5
10

0

votes

1 answer

Why are the two last points on supplemental PUAs excluded?

The supplemental PUAs (F0000-FFFFD and 100000 10FFFD) has explicitely excluded FFFFE, FFFFF, 10FFFE and 10FFFF by defining them as non-characters. Why was this done? Without this they would be nice 65536-point blocks.

unicode noncharacter

asked Jun 30 '16 at 12:50

skyking

13,817
1
35
57

0

votes

1 answer

Which nonnegative integers aren't assigned a character in the UCS?

Coded character sets, as defined by the Unicode Character Encoding Model, map characters to nonnegative integers (e.g. LATIN SMALL LETTER A to 97, both by traditional ASCII and the UCS). Note: There's a difference between characters and abstract…

unicode ucs noncharacter

asked Mar 26 '16 at 04:17

djsp

2,174
2
19
40

Questions tagged [noncharacter]