Questions tagged [southeast-asian-languages]

Languages of Southeast Asia present a set of unique problems. Most require complex script support and have their own writing systems. Most don't use spaces between words, or use spaces between every syllable instead. Specifically: Burmese (Myanmar), Khmer (Cambodian), Lao, Thai, and Vietnamese.

Web Font Resources

Google has a little known resource called Early Access fonts which is accessible from the "More scripts" hyperlink on the upper right side of the page on the typical Google Fonts webpage.

As of 09/2014 numerous web-fonts are available in the Early Access resource offering support for the following languages:

Hebrew      Lao       Ethiopic   Tamil      Thai      Korean
Arabic      Bengali   Hindi      Myanmar    Armenian  Cherokee
Georgian    Gujarati  Gurmukhi   Japanese    Kannada   Khmer
Malayalam   Osmanya   Telugu     Chinese (traditional)

Using a Khmer Google Web Font

<link rel=stylesheet href="http://fonts.googleapis.com/css?family=Suwannaphum">
<p style="font-family: Suwannaphum,DaunPenh,Code2000;font-size:130%;">ខ្ញុំអាចញ៉ាំកញ្ចក់បាន ដោយគ្មានបញ្ហា</p>

enter image description here

Further:

40 questions
15
votes
3 answers

A Viable Solution for Word Splitting Khmer?

I am working on a solution to split long lines of Khmer (the Cambodian language) into individual words (in UTF-8). Khmer does not use spaces between words. There are a few solutions out there, but they are far from adequate (here and here), and…
9
votes
1 answer

What is causing the message "Enable myanmar Zawgyi converter"

I am new to Android and currently running some sample apps. From the logcat, I noticed the message "10-01 20:14:26.536: D/TextLayoutCache(15027): Enable myanmar Zawgyi converter" and wonder what could be causing this. Is that an error of some sort?
clearstake
  • 507
  • 1
  • 5
  • 9
6
votes
1 answer

Android Vietnamese Text to Speech?

I am looking a way to develop an app in Android which speaks Vietnamese from texts. As I know there is no Vietnamese TTS installed by default. So is there any Vietnamese TTS engine for Android around there ? One more thing : I pretend even I have…
JatSing
  • 4,857
  • 16
  • 55
  • 65
6
votes
2 answers

Vietnamese Unicode Text Search in SQLite

I am planning to write an iOS app which uses SQLite as the backend. My database contains Vietnamese text such as "Hải Sơn". The users, being used to Google search, want to enter a search term like "hai son" in order to find the text above. I tried…
Hai Vu
  • 37,849
  • 11
  • 66
  • 93
3
votes
1 answer

Manipulating Thai Characters in PHP

I'm struggling getting Thai characters and PHP working together. This is what I'd like to do: But instead of giving me the first character of $string (ท), I just get…
3
votes
1 answer

Undefined offsets and diacritical marks

I'm trying to parse Laotian text with utf8_ireplace and I'm getting an undefined offset notice. The one thing I can see is that there are diacritical marks. Would that cause that warning? Or can someone give me a clue of why it would always be…
Elin
  • 6,507
  • 3
  • 25
  • 47
2
votes
2 answers

How to create characters map for Khmer Unicode in FPDF?

I want to create the characters map for Khmer Unicode in FPDF like other Unicode, so my Khmer Unicode will be supported in FPDF.But I don't know how. Here is the link to my Unicode characters: http://unicode.org/charts/nameslist/n_1780.html#1780. …
Sophy SEM
  • 223
  • 5
  • 19
2
votes
3 answers

Khmer Unicode in iText

I'm very new in iText. Now I want to display Khmer Unicode in iText, but I can't do it. Does any one know how to do it? Please advise me. Regards, LeeJava
leejava
  • 339
  • 1
  • 3
  • 11
2
votes
0 answers

Using icu::RuleBasedBreakIterator with hardcoded rules

I'm trying to use an ICU RuleBasedBreakIterator in C++ for segmenting Lao text into syllables. ICU has corresponding rules for Thai, which is "same same but different". The SOLR folks have something working in Java that I could get the rules from…
mbethke
  • 935
  • 8
  • 19
2
votes
1 answer

Vietnameses Unicode text in SQLite

I know that this question was asked in this site, but all of answers didn't solved the problem. So I ask again my problem here. I created a sqlite db (via Firefox Sqlite manager tool), and the data is stored with Vietnamese. CREATE TABLE "customer"…
Bentley
  • 143
  • 10
2
votes
2 answers

Can I use CSS "unicode-range" to specify a font across an entire (third party) page?

I've never become fluent with CSS but I don't think I had this situation before. I'm thinking of using stylish to add CSS to a third-party site over which I have no direct control. So the HTML and CSS is not really set up for the kind of…
hippietrail
  • 15,848
  • 18
  • 99
  • 158
2
votes
1 answer

Vim 7.4 binary for Windows, can't process Thai characters. No multi_byte, has multi_byte_ime/dyn instead?

Like at least one other Vim / gVim 7.4 for Windows user, I'm going 'round and 'round in circles trying to get gVim to properly display Unicode. In my case, I have a .py file that contains Thai characters. For example, เมษายน. If in Windows 7…
RBV
  • 1,367
  • 1
  • 14
  • 33
2
votes
2 answers

how to export Vietnamese text to PDF using iText

I'm facing a problem when trying to export a Vietnamese document as PDF using iText. I put Vietnamese words in .xml file like this T\u1ED5 ch\u1EE9c tham…
Chi Nguyen
  • 89
  • 1
  • 8
1
vote
1 answer

Split string with split_part in Asian language

I have a column with Asian addresses. I want to extract the substring until the first whitespace. However, this does not work here. My suspicion is that it has to do with the Asian language, but I do not why nor how to deal with this issue. That's…
Florian Seliger
  • 421
  • 4
  • 16
1
vote
0 answers

Is it possible to get word boundaries of SE Asian scripts via JavaScript?

My goal is to break SE Asian texts into words, preferably from within a browser. While this is trivial to do for western languages using regex or simply splitting on spaces, it's a much tougher problem for some scripts. E.g. find the word boundaries…
Mark Wilbur
  • 2,809
  • 2
  • 23
  • 22
1
2 3