Questions tagged [combining-marks]

Combining characters (marks) are characters that are intended to modify other characters

In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).

Unicode also contains many precomposed characters, so that in many cases it is possible to use both combining diacritics and precomposed characters, at the user's or application's choice. This leads to a requirement to perform Unicode normalization before comparing two Unicode strings and to carefully design encoding converters to correctly map all of the valid ways to represent a character in Unicode to a legacy encoding to avoid data loss.

In Unicode, the main block of combining diacritics for European languages and the International Phonetic Alphabet is U+0300–U+036F. Combining diacritical marks are also present in many other blocks of Unicode characters. In Unicode, diacritics are always added after the main character, so it is possible to add several diacritics to the same character, although as of 2010, few applications support correct rendering of such combinations.

Link: http://en.wikipedia.org/wiki/Combining_character

33 questions
93
votes
4 answers

What's up with these Unicode combining characters and how can we filter them?

กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ ก็็็็็็็็็็็็็็็็็็็็ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้ กิิิิิิิิิิิิิิิิิิิิ ก้้้้้้้้้้้้้้้้้้้้…
XCS
  • 27,244
  • 26
  • 101
  • 151
22
votes
5 answers

How can Z͎̠͗ͣḁ̵͙̑l͖͙̫̲̉̃ͦ̾͊ͬ̀g͔̤̞͓̐̓̒̽o͓̳͇̔ͥ text be prevented?

I've read about how Zalgo text works, and I'm looking to learn how a chat or forum software could prevent that kind of annoyance. More precisely, what is the complete set of Unicode combining characters that needs to: a) either be stripped, assuming…
Dan Dascalescu
  • 143,271
  • 52
  • 317
  • 404
15
votes
3 answers

What is the difference between ‘combining characters’ and ‘grapheme extenders’ in Unicode?

What is the difference between ‘combining characters’ and ‘grapheme extenders’ in Unicode? They seem to do the same thing, as far as I can tell – although the set of grapheme extenders is larger than the set of combining characters. I’m clearly…
Mathias Bynens
  • 144,855
  • 52
  • 216
  • 248
5
votes
1 answer

What component handles a Combining Diaeresis in a string?

I am working a list of file names in Java. I observe that some single characters in the file names, like a, ö and ü actually consist of a sequence you could describe as two single ASCII chars following up: ö is represented by o, ¨ I see this by…
4
votes
0 answers

Does haskell support unicode combining characters?

I've looked at the lexical specification of haskell, and I can use a lower case unicode character is a valid variable name. I believe it not however legal to use modifiers with lowercase letters? Attempts so far suggest not. This stackoverflow post…
OllieB
  • 1,431
  • 9
  • 14
4
votes
3 answers

How to compose syllable blocks with Hangul Jamo

I'm working on a project that would require the input of old Hangul syllable blocks (i.e. Hangul syllable blocks that would utilize obsolete characters such as ㆅ and ㅿ, located in the Hangul Compatibility Jamo unicode block), but I've been having…
crayondraw
  • 151
  • 6
3
votes
2 answers

How do I compare characters with combining diacritic marks ɔ̃, ɛ̃ and ɑ̃ to unaccented ones in python (imported from a utf-8 encoded text file)?

Summary: I want to compare ɔ̃, ɛ̃ and ɑ̃ to ɔ, ɛ and a, which are all different, but my text file has ɔ̃, ɛ̃ and ɑ̃ written as ɔ~, ɛ~ and a~. I wrote a script which moves along the characters in two words simultaneously, comparing them to find the…
RukiyaMeria
  • 133
  • 5
3
votes
2 answers

How do you determine the byte width of a UTF-16 character?

What are the rules for reading a UTF-16 byte stream, to determine how many bytes a character takes up? I've read the standards, but based on empirical observations of real-world UTF-16 encoded streams, it looks like there are certain where the…
Rab
  • 445
  • 4
  • 11
3
votes
1 answer

Unicode Font Rendering Difference in Firefox, Chrome, and Safari

I was working on importing content from some files when I encountered this issue. Some of the unicode characters are rendered wrong in Chrome & Safari (not issues in Firefox). The symbol in question is: र्इ Screenshots from each browser below: …
TheKalpit
  • 1,426
  • 1
  • 14
  • 26
3
votes
4 answers

Highlighting Combining Characters

I'm trying to build a little system which highlights combining characters in a different color than regular characters. Take the following example: * { font-size: 72px } b { font-weight: normal; color: red } Tést A̴ B͓…
p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
2
votes
1 answer

combine lists in a certain order c#

I have winform project in c# that makes math operations. The string comes like "=B10+B4*(B12-B8)". And B10 represents "3", B4 represents "10" B12 represent "6" and B8 represent "2". I want to convert this string to "=3+10*(6-2)". So math operation…
Gokhan
  • 453
  • 4
  • 10
2
votes
0 answers

What platforms support rendering the Unicode combining character ⃠ around existing emoji?

I'm trying to determine what platforms support adding the Unicode character COMBINING ENCLOSING CIRCLE BACKSLASH ( ⃠) around pre-existing emoji. The only documentation I can find about using this composing character with emoji is from Unicode…
bskaggs
  • 1,374
  • 2
  • 12
  • 24
2
votes
1 answer

Myanmar language regular expression showing unwanted character

$result = ဖန္တ $result = preg_replace( "/([\p{L}\p{N}A-Za-z0-9@#\".]{1,}[\p{L}\p{N}A-Za-z0-9\.\_-]{0,})/u", "foo[('$0')]bar", $result); print_r($result); //RESULT: foo[('ဖန')]bar္foo[('တ')]bar See bar္foo in…
Priy Ranjan
  • 119
  • 1
  • 8
2
votes
1 answer

detect any combining character in Java

I am looking for a way to detect if a character in a java string "is a combining character" or not. For instance, String khmerCombiningVowel = new String(new byte[]{(byte) 0xe1,(byte) 0x9f,(byte) 0x80}, "UTF-8"); // unicode 17c0 represents a…
rogerdpack
  • 62,887
  • 36
  • 269
  • 388
2
votes
1 answer

python isalpha doesn't handle unicode combing marks properly?

I encountered weird ukrainian word Кири́лл. I converted it to unicode and tested it with isalpha, which returned False. I looked around and found that this word contains character named 'combining acute accent'. So the letter и́ is actually a…
1
2 3