4

Is possible to find all "paired" characters programmatically?

E.g. when i got for example the < character how to find the correspondent "pair" > to it?

The following piece of code prints each "mirrored" ascii characters.

use 5.018;
use warnings;
use charnames qw(:full);
for my $n (0..127) {
    my $c = chr $n;
    printf "%02x: [%s] - %s\n", $n, $c, charnames::viacode($n) if $c =~ /\p{Bidi_Mirrored=Y}/;
}

prints:

28: [(] - LEFT PARENTHESIS
29: [)] - RIGHT PARENTHESIS
3c: [<] - LESS-THAN SIGN
3e: [>] - GREATER-THAN SIGN
5b: [[] - LEFT SQUARE BRACKET
5d: []] - RIGHT SQUARE BRACKET
7b: [{] - LEFT CURLY BRACKET
7d: [}] - RIGHT CURLY BRACKET

But AFAIK the Bidi_Mirrored property isn't the same as "paired" e.g. left-right pairs, because for example the following char has the Bidi_Mirrored property but probably it has not any "pair".

∰  U+02230 VOLUME INTEGRAL

And if it the Bidi_Mirrored property correct for the "paired" characters, the question is still the same: How to find the "pair's" codepoint? (or name)?

In short: want print all unicode "paired" characters, e.g. pairs like:

«  U+000AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
»  U+000BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

or

≤  U+02264 LESS-THAN OR EQUAL TO
≥  U+02265 GREATER-THAN OR EQUAL TO

etc...

EDIT

Meantime the question got closed, so write my findings here:

I found here the following:

# Bidi_Paired_Bracket is a normative property of type Miscellaneous,
# which establishes a mapping between characters that are treated as
# bracket pairs by the Unicode Bidirectional Algorithm.
#
# Bidi_Paired_Bracket_Type is a normative property of type Enumeration,
# which classifies characters into opening and closing paired brackets
# for the purposes of the Unicode Bidirectional Algorithm.
#
# This file lists the set of code points with Bidi_Paired_Bracket_Type
# property values Open and Close. The set is derived from the character
# properties General_Category (gc), Bidi_Class (bc), Bidi_Mirrored (Bidi_M),
# and Bidi_Mirroring_Glyph (bmg), as follows: two characters, A and B,
# form a bracket pair if A has gc=Ps and B has gc=Pe, both have bc=ON and
# Bidi_M=Y, and bmg of A is B. Bidi_Paired_Bracket (bpb) maps A to B and
# vice versa, and their Bidi_Paired_Bracket_Type (bpt) property values are
# Open (o) and Close (c), respectively.
#
# For legacy reasons, the characters U+FD3E ORNATE LEFT PARENTHESIS and
# U+FD3F ORNATE RIGHT PARENTHESIS do not mirror in bidirectional display
# and therefore do not form a bracket pair.
#
# The Unicode property value stability policy guarantees that characters
# which have bpt=o or bpt=c also have bc=ON and Bidi_M=Y. As a result, an
# implementation can optimize the lookup of the Bidi_Paired_Bracket_Type
# property values Open and Close by restricting the processing to characters
# with bc=ON

Looks, here exists the exact algorithm, but i don't know how to get the Bidi_Mirroring_Glyph aka (bmg) and the Bidi_Paired_Bracket aka (bpb) values in perl. AFAIK the Unicode::UCD doesn't contains these values - or at least i don't know how to get them.

Maybe in the 5.024 and with Unicode 8.0? :) :)

clt60
  • 62,119
  • 17
  • 107
  • 194
  • 2
    Related: [List of all unicode's open/close brackets?](http://stackoverflow.com/questions/13535172/list-of-all-unicodes-open-close-brackets) – nwellnhof Oct 23 '15 at 16:35
  • the accepted answer to that question seems to have all the information needed here (and points out the difficulties) – ysth Oct 23 '15 at 16:52
  • For reference I added some more findings about it into the question. – clt60 Oct 23 '15 at 19:57

0 Answers0