Questions tagged [combining-marks]

Combining characters (marks) are characters that are intended to modify other characters

In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).

Unicode also contains many precomposed characters, so that in many cases it is possible to use both combining diacritics and precomposed characters, at the user's or application's choice. This leads to a requirement to perform Unicode normalization before comparing two Unicode strings and to carefully design encoding converters to correctly map all of the valid ways to represent a character in Unicode to a legacy encoding to avoid data loss.

In Unicode, the main block of combining diacritics for European languages and the International Phonetic Alphabet is U+0300–U+036F. Combining diacritical marks are also present in many other blocks of Unicode characters. In Unicode, diacritics are always added after the main character, so it is possible to add several diacritics to the same character, although as of 2010, few applications support correct rendering of such combinations.

Link: http://en.wikipedia.org/wiki/Combining_character

33 questions

votes

4 answers

What's up with these Unicode combining characters and how can we filter them?

กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้…

asked May 02 '12 at 13:34

XCS

27,244
26
101
151

votes

5 answers

How can Z͎̠͗ͣḁ̵͙̑l͖͙̫̲̉̃ͦ̾͊ͬ̀g͔̤̞͓̐̓̒̽o͓̳͇̔ͥ text be prevented?

I've read about how Zalgo text works, and I'm looking to learn how a chat or forum software could prevent that kind of annoyance. More precisely, what is the complete set of Unicode combining characters that needs to: a) either be stripped, assuming…

javascript unicode diacritics combining-marks zalgo

asked Mar 09 '14 at 00:47

Dan Dascalescu

143,271
52
317
404

votes

3 answers

What is the difference between ‘combining characters’ and ‘grapheme extenders’ in Unicode?

What is the difference between ‘combining characters’ and ‘grapheme extenders’ in Unicode? They seem to do the same thing, as far as I can tell – although the set of grapheme extenders is larger than the set of combining characters. I’m clearly…

unicode terminology grapheme combining-marks

asked Feb 12 '14 at 08:45

Mathias Bynens

144,855
52
216
248

votes

1 answer

What component handles a Combining Diaeresis in a string?

I am working a list of file names in Java. I observe that some single characters in the file names, like a, ö and ü actually consist of a sequence you could describe as two single ASCII chars following up: ö is represented by o, ¨ I see this by…

java string character-encoding unicode-normalization combining-marks

asked Nov 04 '15 at 10:34

peter_the_oak

3,529
3
23
37

votes

0 answers

Does haskell support unicode combining characters?

I've looked at the lexical specification of haskell, and I can use a lower case unicode character is a valid variable name. I believe it not however legal to use modifiers with lowercase letters? Attempts so far suggest not. This stackoverflow post…

haskell unicode combining-marks

asked Mar 17 '17 at 16:51

OllieB

1,431
9
14

votes

3 answers

How to compose syllable blocks with Hangul Jamo

I'm working on a project that would require the input of old Hangul syllable blocks (i.e. Hangul syllable blocks that would utilize obsolete characters such as ㆅ and ㅿ, located in the Hangul Compatibility Jamo unicode block), but I've been having…

input unicode ms-word asianfonts combining-marks

asked Nov 25 '15 at 06:36

crayondraw

votes

2 answers

How do I compare characters with combining diacritic marks ɔ̃, ɛ̃ and ɑ̃ to unaccented ones in python (imported from a utf-8 encoded text file)?

Summary: I want to compare ɔ̃, ɛ̃ and ɑ̃ to ɔ, ɛ and a, which are all different, but my text file has ɔ̃, ɛ̃ and ɑ̃ written as ɔ~, ɛ~ and a~. I wrote a script which moves along the characters in two words simultaneously, comparing them to find the…

python string utf-8 diacritics combining-marks

asked Apr 30 '21 at 14:51

RukiyaMeria

votes

2 answers

How do you determine the byte width of a UTF-16 character?

What are the rules for reading a UTF-16 byte stream, to determine how many bytes a character takes up? I've read the standards, but based on empirical observations of real-world UTF-16 encoded streams, it looks like there are certain where the…

unicode utf-16 combining-marks ucs

asked Apr 24 '21 at 15:40

Rab

votes

1 answer

Unicode Font Rendering Difference in Firefox, Chrome, and Safari

I was working on importing content from some files when I encountered this issue. Some of the unicode characters are rendered wrong in Chrome & Safari (not issues in Firefox). The symbol in question is: र्इ Screenshots from each browser below: …

google-chrome unicode fonts rendering combining-marks

asked Mar 05 '19 at 21:23

TheKalpit

1,426
1
14
26

votes

4 answers

Highlighting Combining Characters

I'm trying to build a little system which highlights combining characters in a different color than regular characters. Take the following example: * { font-size: 72px } b { font-weight: normal; color: red } Tést A̴ B͓…

html css combining-marks

asked Oct 16 '14 at 15:12

p.s.w.g

146,324
30
291
331

votes

1 answer

combine lists in a certain order c#

I have winform project in c# that makes math operations. The string comes like "=B10+B4*(B12-B8)". And B10 represents "3", B4 represents "10" B12 represent "6" and B8 represent "2". I want to convert this string to "=3+10*(6-2)". So math operation…

c# list winforms math combining-marks

asked Feb 21 '21 at 10:29

Gokhan

votes

0 answers

What platforms support rendering the Unicode combining character ⃠ around existing emoji?

I'm trying to determine what platforms support adding the Unicode character COMBINING ENCLOSING CIRCLE BACKSLASH ( ⃠) around pre-existing emoji. The only documentation I can find about using this composing character with emoji is from Unicode…

unicode emoji platform combining-marks

asked Aug 05 '16 at 13:55

bskaggs

1,374
2
12
24

votes

1 answer

Myanmar language regular expression showing unwanted character

$result = ဖန္တ $result = preg_replace( "/([\p{L}\p{N}A-Za-z0-9@#\".]{1,}[\p{L}\p{N}A-Za-z0-9\.\_-]{0,})/u", "foo[('$0')]bar", $result); print_r($result); //RESULT: foo[('ဖန')]bar္foo[('တ')]bar See bar္foo in…

php regex unicode preg-replace combining-marks

asked Apr 15 '15 at 16:29

Priy Ranjan

votes

1 answer

detect any combining character in Java

I am looking for a way to detect if a character in a java string "is a combining character" or not. For instance, String khmerCombiningVowel = new String(new byte[]{(byte) 0xe1,(byte) 0x9f,(byte) 0x80}, "UTF-8"); // unicode 17c0 represents a…

java regex unicode combining-marks

asked Mar 17 '15 at 22:25

rogerdpack

62,887
36
269
388

votes

1 answer

python isalpha doesn't handle unicode combing marks properly?

I encountered weird ukrainian word Кири́лл. I converted it to unicode and tested it with isalpha, which returned False. I looked around and found that this word contains character named 'combining acute accent'. So the letter и́ is actually a…

python unicode combining-marks

asked Feb 20 '14 at 22:15

user1785295

2 3 Next