4

I'm trying to check if a certain char is a vowel. What's the best way to go about doing this?

durron597
  • 31,968
  • 17
  • 99
  • 158
Ky -
  • 30,724
  • 51
  • 192
  • 308
  • I'm hoping this'll help out people answering questions that are much more broad, or too specific, like http://stackoverflow.com/q/19160921/ which focuses on a single word, or http://stackoverflow.com/q/20454840/ which also deals with compile errors, or http://stackoverflow.com/q/16432482/ which asks specificly about the character at the end of a string. – Ky - Oct 24 '14 at 23:17

5 Answers5

11

Here's the solution I've been using for a while, and it hasn't let me down yet:

private static String VOWELS = "AÀÁÂÃÄÅĀĂĄǺȀȂẠẢẤẦẨẪẬẮẰẲẴẶḀÆǼEȄȆḔḖḘḚḜẸẺẼẾỀỂỄỆĒĔĖĘĚÈÉÊËIȈȊḬḮỈỊĨĪĬĮİÌÍÎÏIJOŒØǾȌȎṌṎṐṒỌỎỐỒỔỖỘỚỜỞỠỢŌÒÓŎŐÔÕÖUŨŪŬŮŰŲÙÚÛÜȔȖṲṴṶṸṺỤỦỨỪỬỮỰYẙỲỴỶỸŶŸÝ";
private static boolean isVowel(char c)
{
    return VOWELS.indexOf(Character.toUpperCase(c)) >= 0;
}

For my applications, it's reasonably fast.

Ky -
  • 30,724
  • 51
  • 192
  • 308
  • @Clashsoft make sure you're compiling with UTF-8: `javac "/path/to/your/file.java" -encoding UTF-8`. Also save it in a UTF-8 file without the BOM. – Ky - Feb 15 '15 at 01:32
  • 3
    Your solution does not recognize 'A' as a vowel, because `indexOf` returns `0`, but `0 > 0` is false. – fredoverflow Jul 13 '18 at 19:41
  • Thanks, @fredoverflow! – Ky - Jul 13 '18 at 19:52
  • 1
    Good solution. The obvious limitation which is easily rectified is that it deals only with western scripts. The more tricky option is that in some languages the same letter can be both consonant and vowel, like *W* in Welsh language – diginoise Jul 30 '19 at 09:57
  • @diginoise Absolutely true! Feel free to post an answer that suits your situation; it'll likely help others too! – Ky - Jul 31 '19 at 05:06
2

One way to do this is using if-else or switch case like @TylerWeaver's answer. If you want to do this in one line just use regular expressions

Something like this:

For Vowels:

aStr.matches("[aeiou]")

For Consonants:

aStr.matches("[^aeiou]")

Regular expressions make life very simple and is fairly easy to learn, too. Look at this cheatsheet.

In this case you are just creating a range [aeiou] which means your character must match either a or e or i or o or u . [^aeiou] is all characters other than the ones mentioned in the range.

Ky -
  • 30,724
  • 51
  • 192
  • 308
Chiseled
  • 2,280
  • 8
  • 33
  • 59
1

Create a switch statement. For example:

switch (Character.toLowerCase(foo)) {
  case 'a':
  case 'e':
  case 'i':
  case 'o':
  case 'u':
  case 'y': return true;
  default: return false;
}

For Unicode, How do I detect unicode characters in a Java string? follow answer given here then using a switch statement i provided.

Community
  • 1
  • 1
DanSchneiderNA
  • 378
  • 4
  • 16
  • 1
    This is what I tried at first, but it failed on letters with accents like `'é'` as in `"café"` or `'Æ'` as in `"Æther"` – Ky - Oct 24 '14 at 23:28
  • 1
    Ahh. If you want Unicode characters then just add cases for them. Also, @Ben_Leggiero's solution works well for Unicode characters. – DanSchneiderNA Oct 24 '14 at 23:30
  • 1
    Sadly, unicode doesn't have a class or property for vowels. It makes sense, because some script (ideographic ones for example) don't really have a sensible notion of a vowel. As long as you're reasonably sure of what you consider vowels (for example, you can probably ignore Japanese Kana), @BenLeggiero's approach is fine. – jjm Oct 24 '14 at 23:46
  • @BenLeggiero See this question: http://stackoverflow.com/questions/1008802/converting-symbols-accent-letters-to-english-alphabet. There are a couple possible methods, but I haven't tried them. If that works, you can then check the resulting characters for `[aeiou]`. (You may have to add `æ`.) – ajb Oct 24 '14 at 23:51
1

I've tried to implement this to cover as many languages as have vowel-like letters in them. By my count, there are 637 Unicode letters that would be useful to count as vowels. I have a class for vowels with a static block setting up a HashSet of Strings representing each vowel. I use a method which assumes a codepoint (int) rather than a char:

public class Vowels {

  public Vowels() {
  }

  public static Set<String> vowelStrs;

  static {
    vowelStrs = new HashSet<String>();
    vowelStrs.add("A");
    vowelStrs.add("E");
    vowelStrs.add("I");
    ...
    vowelStrs.add("\u00c4");
    ...
    vowelStrs.add("\ua66b");
    vowelStrs.add("\ua66c");
    vowelStrs.add("\ua66d");
  }

  public boolean isMember(int inChar) {
    String inStr = new String(Character.toChars(inChar));
    return(Vowels.vowelStrs.contains(inStr));
  }
}
  • I love the idea of using sets! – Ky - Feb 21 '17 at 15:45
  • 1
    I'm actually up to 800+ now, with some help from speakers of Hangul and Tagalog. Probably the better solution is to put this list into a UTF-8 text file as a resource, and load into the set from the resource. Text editors have a habit of adding unwelcome characters/translations of your data, though (BOM, not understanding UTF-8, etc.). This way, everything is in 7-bit ASCII and the compiler will check for formatting mistakes. – SplendidSplinter Feb 21 '17 at 20:07
  • That's really awesome. Let me know when you have a complete list and put it up on GitHub or something :D – Ky - Feb 22 '17 at 18:22
  • 2
    You need to publish a definitive list of Unicode vowels, as a public service. (Not useful to me, but I'm sure that others would appreciate it!) – Jeff Grigg Jul 18 '17 at 19:32
  • @JeffGrigg I was about to say that you could use the [Unicode Character Database](http://www.unicode.org/ucd/) for that, but looking through it, it doesn't have any way to tell if a character is a vowel or not – Ky - Jul 31 '19 at 05:29
  • 1
    See also... https://stackoverflow.com/questions/38792789/how-to-match-unicode-vowels The consensus seems to be that there is no general correct answer to your question. And that to address your needs, you'll need to research or present more information about your underlying requirements. What are you trying to accomplish that makes you think that identifying all the "UNICODE vowel characters" would help you do that? – Jeff Grigg Aug 01 '19 at 13:48
  • @JeffGrigg I have no idea what problem I was trying to solve back in 2014 when I was asking the question whose answer you posted this comment on. I assume it was some sort of language processing thing, given what I was into in 2014. I noted Unicode because Java strings are Unicode strings, and because the Unicode Consortium has done a lot of research into languages and the properties of their glyphs. – Ky - Nov 16 '20 at 20:54
1

Riffing on the accepted answer, here is a solution that requires at most 2 efficient table lookups:

public static boolean isVowel(char c) {
    switch (c) {
        case 65:
        case 69:
        case 73:
        case 79:
        case 85:
        case 89:
        case 97:
        case 101:
        case 105:
        case 111:
        case 117:
        case 121:
        case 192:
        case 193:
        case 194:
        case 195:
        case 196:
        case 197:
        case 198:
        case 200:
        case 201:
        case 202:
        case 203:
        case 204:
        case 205:
        case 206:
        case 207:
        case 210:
        case 211:
        case 212:
        case 213:
        case 214:
        case 216:
        case 217:
        case 218:
        case 219:
        case 220:
        case 221:
        case 224:
        case 225:
        case 226:
        case 227:
        case 228:
        case 229:
        case 230:
        case 232:
        case 233:
        case 234:
        case 235:
        case 236:
        case 237:
        case 238:
        case 239:
        case 242:
        case 243:
        case 244:
        case 245:
        case 246:
        case 248:
        case 249:
        case 250:
        case 251:
        case 252:
        case 253:
        case 255:
        case 256:
        case 257:
        case 258:
        case 259:
        case 260:
        case 261:
        case 274:
        case 275:
        case 276:
        case 277:
        case 278:
        case 279:
        case 280:
        case 281:
        case 282:
        case 283:
        case 296:
        case 297:
        case 298:
        case 299:
        case 300:
        case 301:
        case 302:
        case 303:
        case 304:
        case 305:
        case 306:
        case 307:
        case 332:
        case 333:
        case 334:
        case 335:
        case 336:
        case 337:
        case 338:
        case 339:
        case 360:
        case 361:
        case 362:
        case 363:
        case 364:
        case 365:
        case 366:
        case 367:
        case 368:
        case 369:
        case 370:
        case 371:
        case 374:
        case 375:
        case 376:
        case 506:
        case 507:
        case 508:
        case 509:
        case 510:
        case 511:
        case 512:
        case 513:
        case 514:
        case 515:
        case 516:
        case 517:
        case 518:
        case 519:
        case 520:
        case 521:
        case 522:
        case 523:
        case 524:
        case 525:
        case 526:
        case 527:
        case 532:
        case 533:
        case 534:
        case 535:
            return true;
        default:
            switch (c) {
                case 7680:
                case 7681:
                case 7700:
                case 7701:
                case 7702:
                case 7703:
                case 7704:
                case 7705:
                case 7706:
                case 7707:
                case 7708:
                case 7709:
                case 7724:
                case 7725:
                case 7726:
                case 7727:
                case 7756:
                case 7757:
                case 7758:
                case 7759:
                case 7760:
                case 7761:
                case 7762:
                case 7763:
                case 7794:
                case 7795:
                case 7796:
                case 7797:
                case 7798:
                case 7799:
                case 7800:
                case 7801:
                case 7802:
                case 7803:
                case 7833:
                case 7840:
                case 7841:
                case 7842:
                case 7843:
                case 7844:
                case 7845:
                case 7846:
                case 7847:
                case 7848:
                case 7849:
                case 7850:
                case 7851:
                case 7852:
                case 7853:
                case 7854:
                case 7855:
                case 7856:
                case 7857:
                case 7858:
                case 7859:
                case 7860:
                case 7861:
                case 7862:
                case 7863:
                case 7864:
                case 7865:
                case 7866:
                case 7867:
                case 7868:
                case 7869:
                case 7870:
                case 7871:
                case 7872:
                case 7873:
                case 7874:
                case 7875:
                case 7876:
                case 7877:
                case 7878:
                case 7879:
                case 7880:
                case 7881:
                case 7882:
                case 7883:
                case 7884:
                case 7885:
                case 7886:
                case 7887:
                case 7888:
                case 7889:
                case 7890:
                case 7891:
                case 7892:
                case 7893:
                case 7894:
                case 7895:
                case 7896:
                case 7897:
                case 7898:
                case 7899:
                case 7900:
                case 7901:
                case 7902:
                case 7903:
                case 7904:
                case 7905:
                case 7906:
                case 7907:
                case 7908:
                case 7909:
                case 7910:
                case 7911:
                case 7912:
                case 7913:
                case 7914:
                case 7915:
                case 7916:
                case 7917:
                case 7918:
                case 7919:
                case 7920:
                case 7921:
                case 7922:
                case 7923:
                case 7924:
                case 7925:
                case 7926:
                case 7927:
                case 7928:
                case 7929:
                    return true;
            }
    }
    return false;
}

The nested switches are necessary to keep the cases dense and generate two constant-time tableswitch instructions. A single switch with a large gap between 535 and 7680 would generate a logarithmic-time lookupswitch instruction instead.

fredoverflow
  • 256,549
  • 94
  • 388
  • 662
  • 1
    Why use cryptic numbers instead of character literals? – Ky - Jul 14 '18 at 02:23
  • With character literals, the split into 2 switches won't make any sense. – fredoverflow Jul 14 '18 at 02:40
  • can you elaborate on that? I don't know much about bytecode – Ky - Jul 14 '18 at 03:02
  • 1
    @Ben Leggiero take a look at this question https://stackoverflow.com/questions/10287700/difference-between-jvms-lookupswitch-and-tableswitch – Bharat Aug 25 '19 at 17:29
  • 1
    @Bharat Okay, I read the answers there and it's clear why you split it into 2 switches. What I still don't understand is why you don't use character literals; AFAIK they're compiled to their UTF-8 codepoints so the bytecode would be the same, unless I'm mistaken. – Ky - Aug 27 '19 at 15:46