3

I'm using Pango to typeset Devanagari. Consider the string उम्कन्छौ consisting of DEVANAGARI LETTER U, DEVANAGARI LETTER MA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER KA, DEVANAGARI LETTER NA, DEVANAGARI SIGN VIRAMA, DEVANAGARI LETTER CHA, DEVANAGARI VOWEL SIGN AU. When typesetting this string, I want to know the starting point of छ (CHA) to put a visual mark.

For ordinary strings I would take the length of the preceding part, उम्कन् but this doesn't work here since as you can see न् (half न) combines with छ so the result is slightly off.

Is there a way to obtain the correct starting point of a letter when combinations are involved?

I've tried querying the Pango layout using index_to_pos(), but this seems to work on the byte level (not characters).

This small Perl program shows the problem. The vertical line is off to the right.

use strict;
use warnings;
use utf8;
use Cairo;
use Pango;

my $surface = Cairo::PdfSurface->create ("out.pdf", 595, 842);
my $cr = Cairo::Context->create ($surface);
my $layout = Pango::Cairo::create_layout($cr);
my $font = Pango::FontDescription->from_string('Lohit Devanagari');
$layout->set_font_description($font);

# Two parts of the phrase. Phrase1 ends in न् (half न).
my $phrase1 = 'उम्कन्';
my $phrase2 = 'छौ';

# Set the first part of the phrase, and get its width.
$layout->set_markup($phrase1);
my $w = ($layout->get_size)[0]/1024;

# Set the complete phrase.
$layout->set_markup($phrase1.$phrase2);

my ($x, $y ) = ( 100, 100 );

# Show phrase.
$cr->move_to( $x, $y );
$cr->set_source_rgba( 0, 0, 0, 1 );
Pango::Cairo::show_layout($cr, $layout);

# Show marker at width.
$cr->set_line_width(0.25);
$cr->move_to( $x + $w, $y-10 );
$cr->line_to( $x + $w, $y+50 );
$cr->stroke;

$cr->show_page;
Ether
  • 53,118
  • 13
  • 86
  • 159
  • 1
    The question would benefit of some minimal runnable program showing how the visual mark is put in the wrong place. Potential question answerers should not each have to spend time to come up with a demo, and risk getting/understanding it wrong. Please [edit the question](https://stackoverflow.com/posts/58395294/edit) to amend. Also see section *Help others reproduce the problem* in . ––– If it's Perl, I'll be glad to investigate. – daxim Oct 15 '19 at 16:33

1 Answers1

3

You cannot measure a partial rendering. Instead measure the whole rendering and iterate over the string grapheme-wise to find the position. Also see: https://gankra.github.io/blah/text-hates-you/#style-can-change-mid-ligature

use strict;
use warnings;
use utf8;
use Cairo;
use Pango;
use List::Util qw(uniq);
use Encode qw(encode);

my $surface = Cairo::PdfSurface->create('out.pdf', 595, 842);
my $cr = Cairo::Context->create ($surface);
my $layout = Pango::Cairo::create_layout($cr);
my $font = Pango::FontDescription->from_string('Lohit Devanagari');
$layout->set_font_description($font);
my $phrase = 'उम्कन्छौ';
my @octets = split '', encode 'UTF-8', $phrase; # index_to_pos operates on octets
$layout->set_markup($phrase);
my ($x, $y) = (100, 100);
$cr->move_to($x, $y);
$cr->set_source_rgba(0, 0, 0, 1);
Pango::Cairo::show_layout($cr, $layout);
$cr->set_line_width(0.25);
my @offsets = uniq map { $layout->index_to_pos($_)->{x}/1024 } 0..$#octets;
# (0, 9.859375, 16.09375, 27.796875, 33.953125, 49.1875)
for my $offset (@offsets) {
    $cr->move_to($x+$offset, $y-5);
    $cr->line_to($x+$offset, $y+25);
    $cr->stroke;
}
my @graphemes = $phrase =~ /\X/g; # qw(उ म् क न् छौ)
while (my ($idx, $g) = each @graphemes) {
    if ($g =~ /^छ/) {
        $cr->move_to($x+$offsets[$idx], $y-10);
        $cr->line_to($x+$offsets[$idx], $y+50);
        $cr->stroke;
        last;
    }
}
$cr->show_page;
daxim
  • 39,270
  • 4
  • 65
  • 132
  • Nifty, and very useful! I found that in {many,most,all} situations correct results were obtained by measuring the length of the _complete_ phrase and subtracting the length of the _second_ part of the phrase. Your solution is provably correct. – Johan Vromans Oct 17 '19 at 12:35