Questions tagged [thai]

The national language of Thailand whose unique script presents various challenges when processing or rendering text.

The national language of Thailand whose unique script presents various challenges when processing or rendering text:

  • Non-spacing / combining marks for some vowels and tone marks
  • Some tone marks stack vertically
  • Some vowel letters are not rendered visually in the same order they are stored logically as data
  • Spaces are not used between words
106 questions
14
votes
1 answer

Error While Renaming An Extracted Zip File To Other Languages In PHP

I use PHP ZipArchive class to extract a .zip file, It works fine for English, but cause problems in my local language (THAI). I use icov('utf-8','windows-874',$zip->getNameIndex($i)) to convert utf-8 to THAI. It works for folder's/file's name, but…
9
votes
5 answers

Regular Expression to accept all Thai characters and English letters in python

I need to vectorize text documents in Thai (e.g Bag of Words, doc2vec). First I want to go over each document, omitting everything except the Thai characters and English words (e.g. no punctuation, no numbers, no other special characters except…
Shani Shalgi
  • 617
  • 1
  • 6
  • 19
8
votes
3 answers

Splitting Thai text by characters

Not by word boundaries, that is solvable. Example: #!/usr/bin/env python3 text = 'เมื่อแรกเริ่ม' for char in text: print(char) This produces: เ ม อ แ ร ก เ ร ม Which obviously is not the desired output. Any ideas? A portable…
josifoski
  • 1,696
  • 1
  • 14
  • 19
7
votes
1 answer

DateTime.ParseExact problem with Thai / Buddhist Era Time

After client downloads a file from our server with our app, the app does a ParseExact on a date string which comes down from the server in the form: yyyy/mm/dd HH:mm:ss. After alot of confusion, I noticed in some logs that the date on the clients…
James R
  • 651
  • 1
  • 12
  • 21
7
votes
2 answers

Korean, Thai and Indonesian POS tagger

Can someone recommend an open source POS tagger for Korean, Indonesian, Thai and Vietnamese? That I can use to tag the corpus data that I currently have. (e.g. the stanford-postagger) If you are a dev and care to share and let me test out the POS…
alvas
  • 115,346
  • 109
  • 446
  • 738
5
votes
2 answers

How to handle height of Thai text in UILabel?

I’m working on an application we’re currently translating to Thai. Everything went smoothly when we tested the app on iOS 7, but on iOS 8, some accents were clipped by UILabels. We’re using Auto-Layout to layout all of the elements of the…
Frizlab
  • 846
  • 9
  • 30
4
votes
0 answers

Thai line breaks on iOS/Android

I'm working on Thai localization for the mobile game in Unity. The problem occurs when I send Thai text via push notifications, as the system does not break lines properly. It breaks only by spaces, but fails to recognize actual words. I assume…
4
votes
1 answer

Real length of a Thai UTF-8 encoded string in Delphi

Thai is a very special language. You can write vowels (32 in total) as in any other languages right after the consonant, or IN FRONT of it, or ON TOP of it, or ON THE BOTTOM of it (ok, just the short and long "u" sound can go on the bottom, but…
ZioBit
  • 905
  • 10
  • 29
4
votes
1 answer

Thai fonts rendering

Currently we need to display thai in our game, which use cocos2dx 2.x game engine. But some fonts are not correctly displayed. Original text: ยินดีต้อนรับสู่{p0} ขอให้ท่านเล่นเกมให้สนุก Displayed in vs code(correct in vscode): Displayed in sublime…
Eddy Lin
  • 475
  • 2
  • 6
  • 19
4
votes
2 answers

Multi-Language ElasticSearch Support

I am indexing messages from all around the world but mainly Thailand. The indexed messages will most likely contain either English or Thai. Does anyone know the best way to set the ES index so that it will return good search results for both Thai…
Rob Evans
  • 6,750
  • 4
  • 39
  • 56
4
votes
1 answer

How to display Thai diactirics properly on Android?

A short preface. Thai script has vowel signs that may appear above the consonants, and also there are diacritic signs (DS) that also appear above the consonants; when both vowel and DS present, they appear one above other, so the vowel is set above…
Alexander Dunaev
  • 980
  • 1
  • 15
  • 40
4
votes
3 answers

Number of characters in Java String

Possible Duplicate: Java: length of string when using unicode overline to display square roots? How do I get number of Unicode characters in a String? Given a char[] of Thai characters: [อ, ภ, ิ, ช, า, ต, ิ] This comes out in String…
datacrush
  • 43
  • 1
  • 4
3
votes
2 answers

Does Orbeon Form support the Thai language?

I am new to Orbeon Form and would like to use it. However, I had tried the Form examples on Orbeon Form Web Site and input some of data in Thai Language. Yes, It can be input data in the fields with “Thai Language”. But when I try to generate…
Pearapon Joe
  • 869
  • 1
  • 8
  • 8
3
votes
2 answers

How to handle combining characters along with the \p{L} pattern for Thai strings?

I need to detect text with Unicode characters restricting it to letters only (e.g. no symbols, emojis, etc., just something that can be used in a person's name in any Unicode language). The \p{L} category seems to do the trick, but it does not…
Alek Davis
  • 10,628
  • 2
  • 41
  • 53
3
votes
2 answers

In Python 3, count Thai character positions

FIRST, I've used the Python 3 grapheme library to solve my problem. (For a bit more about grapheme, see this article). But I'm surprised that Python 3 couldn't do this without a specialized library... I resorted to grapheme because after many web…
RBV
  • 1,367
  • 1
  • 14
  • 33
1
2 3 4 5 6 7 8