-2

Possible Duplicate:
Detecting syllables in a word

Assume the input string is "saya sedang makan nasi goreng" I want to break it into syllables: "sa", "ya", "se", "dang", "ma", "kan", "na", "si", "go", "reng"

How can I do it in Java? can somebody help me?

Community
  • 1
  • 1
Cintz
  • 29
  • 2
  • 5
  • 1
    I think this is a problem of defining the formal rules for what constitutes a syllable in Indonesian, rather than a programming problem. Once you have defined the formal rules, the programming should be *trivial*. – Klaus Byskov Pedersen Jan 14 '12 at 12:31
  • A good place to start: http://stackoverflow.com/questions/405161/detecting-syllables-in-a-word – Jean Logeart Jan 14 '12 at 12:31
  • @KlausByskovHoffmann The program *may* be trivial, but the dictionary that it needs might be rather large :) – Sergey Kalinichenko Jan 14 '12 at 12:35
  • @dasblinkenlight - If you read the linked Q/A's, syllable splitting in English done by applying a fixed set rules and using a *small* dictionary of words that rules don't work for. I'd expect the rule set / dictionary to be smaller for Indonesian because the spelling / pronunciation are more consistent. – Stephen C Jan 14 '12 at 12:52

3 Answers3

1

Without voice input, you need a 'Syllable Dictionary' to do that.

EDIT: It's been discussed on this site already.

Community
  • 1
  • 1
bchetty
  • 2,231
  • 1
  • 19
  • 26
1

That's not an easy thing to do. But, if you still want to do it, I think your best bet is to search for a dictionary database (which gives you syllable breakdown for every word, though this is hard to find), download it and write a program to query the database and fetch the syllable breakup.

Divya
  • 2,594
  • 1
  • 17
  • 29
0

Here's a naive solution:

String input = "saya sedang makan nasi goreng";
Matcher m = Pattern.compile("[^aeiou]?[AEIOUaeiou](ng|n)?").matcher(input);
int s = 0;
while (m.find()) {
  System.out.println(input.substring(s, m.end()).trim());
  s = m.end();
}

Edit:
@Stephen C is right. Here's a proper solution based on syllable formation rules of the Indonesian language (from source)

In Indonesian a syllable consists of a vowel plus the immediately precending consonant. It also includes anyh following consonant that does not immediately precede the next vowel.

Note that ng counts as a single consonant.

String input = "SAYA sedang makan nasi goreng garam asal saat air ia bentuk";
Matcher m = Pattern.compile("[^aeiou]?[aeiou]((ng|[^aeiou])(?![aeiou]))?",
              Pattern.CASE_INSENSITIVE).matcher(input);
int s = 0;
while (m.find()) {
  System.out.println(input.substring(s, m.end()).trim());
  s = m.end();
}

Please note that (also mentioned in the source above) syllables as they are pronounced in speech may be slightly different, e.g. in-speech: ma-kan-an, program output: ma-ka-nan.

Edit 2: OK. Further studying revealed that I have missed out the ny, sy and kh consonants. Also fixed couple of other problems. Here's the updated regular expression:

"(ng|ny|sy|kh|[^aeiou])?[aeiou]((ng|ny|sy|kh|([^aeiou](?![gyh]))(?![aeiou])))?"
rodion
  • 14,729
  • 3
  • 53
  • 55