4

Thai is a very special language. You can write vowels (32 in total) as in any other languages right after the consonant, or IN FRONT of it, or ON TOP of it, or ON THE BOTTOM of it (ok, just the short and long "u" sound can go on the bottom, but anyway...).

Furthermore, there are other modifiers (the 4 tone markers, the ga-ran, the mai-tai-ku and other ones) that can go ON TOP of an already existing vowel!

For example:

 ที่ดีที่สุด (the best)

As you can see, if I try to print it with a monospaced font, the "real length" would be of 5 characters, but all the UTF-8 strlen routines give me back 11 characters - which is TOTALLY CORRECT, but I need to know the "actual space" that the string will use on screen/on printer, when printed monospaced.

Sure, an easy solution would be to list all the special characters that can go on the top or on the bottom of the word, and remove them from the total count.

Since I am not sure I can find all the special characters, is there already a routine made in any language so that I can translate it in Delphi?

Thank you

ZioBit
  • 905
  • 10
  • 29
  • 9
    What you are asking for is the Grapheme size that a font would render visually after processing Unicode combining codepoints. You are not going to find anything for that in the Delphi RTL. And since it is directly tied to font usage, you need something like VCL's `TCanvas.TextExtent()` or FMX's `TCanvas.Text(Width|Height)()` method(s), with a Unicode font loaded in the `TCanvas.Font` (or use the Win32 [`GetTextExtentPoint32()`](https://msdn.microsoft.com/en-us/library/dd144938.aspx) function directly) . – Remy Lebeau Oct 05 '17 at 06:52
  • 1
    Thank you, I am just afraid that simulating writing the font would be a fairly long process, but I'll do that. BTW, why didn't you answer? I cannot accept a comment :) – ZioBit Oct 05 '17 at 09:23
  • 2
    When you want to address a comment to somebody else than the asker or the responder, preceed the addressees name with the at (@) character, like in: @RemyLebeau I think your comment would be great to enter as an answer – Tom Brunberg Oct 05 '17 at 11:13
  • If by *actual space* you mean the pixel width and height, @RemyLebeau has given you the solution with `GetTextExtentPoint32`, which says that (using the CordialUPC font at size 20 you've used) the string is 57 pixels wide by 37 pixels high at the default 96 pixels per inch of a form. Scaling to a Printer.Canvas dimension is a matter of using the Printer.Canvas.Handle instead of the form's Canvas.Handle as the HDC parameter. – Ken White Oct 05 '17 at 18:43
  • The codepoints that are meant to be rendered with a base character are called "combining characters". However, that isn't an exact solution either because there are other typesetting characters that aren't rendered to a grapheme such as line feed and zero-width space. Also, for some some diacritics are composed into singular codepoints with some latin letters. See the [Unicode FAQ](http://www.unicode.org/faq/char_combmark.html). – Tom Blodget Oct 06 '17 at 10:23

1 Answers1

1

In C++:

    /*---------------------------------------------------------------------------*/
    /*                              thai_tcslen                                  */
    /*---------------------------------------------------------------------------*/
    long thai_tcslen(_TCHAR *buff)
    {
      long bufpos = 0;
      long normal_length = _tcslen(buff);
      long thai_length = 0;

      for (bufpos = 0; bufpos < normal_length; ++bufpos) {
        if (   *(buff+bufpos) != _T('Ñ')/*mai han na kaad*//*-047*/
            && *(buff+bufpos) != _T('Ô')/*sara ee        *//*-044*/
            && *(buff+bufpos) != _T('Õ')/*sara eeeee     *//*-043*/
            && *(buff+bufpos) != _T('Ö')/*sara uu        *//*-042*/
            && *(buff+bufpos) != _T('×')/*sara uuuuu     *//*-041*/
            && *(buff+bufpos) != _T('Ø')/*sara oo        *//*-040*/
            && *(buff+bufpos) != _T('Ù')/*sara ooooo     *//*-039*/
            && *(buff+bufpos) != _T('ç')/*mai tai khoo   *//*-025*/
            && *(buff+bufpos) != _T('è')/*mai aek        *//*-024*/
            && *(buff+bufpos) != _T('é')/*mai toe        *//*-023*/
            && *(buff+bufpos) != _T('ê')/*mai cha ta wah *//*-022*/
            && *(buff+bufpos) != _T('ë')/*mai tree       *//*-021*/
            && *(buff+bufpos) != _T('ì')/*ka ran         *//*-020*/
            ) {
          ++thai_length;
        }
      }

      return thai_length;
    } /* thai_tcslen */

in VB6:

    Public Function ThaiStringLength(ByRef ThaiString As String) As Long
      Dim b As String, noLengthChars(13) As Byte
      b = ThaiString

      noLengthChars(0) = 209
      noLengthChars(1) = 212
      noLengthChars(2) = 213
      noLengthChars(3) = 214
      noLengthChars(4) = 215
      noLengthChars(5) = 216
      noLengthChars(6) = 217
      noLengthChars(7) = 231
      noLengthChars(8) = 232
      noLengthChars(9) = 233
      noLengthChars(10) = 234
      noLengthChars(11) = 235
      noLengthChars(12) = 236

      Dim o As Long
      For o = 0 To 12
        If InStr(b, Chr(noLengthChars(o))) > 0 Then
          b = Replace(b, Chr(noLengthChars(o)), "")
        End If
      Next
      ThaiStringLength = Len(b)
    End Function