2

I have implemented almost the same code in Objective-c, and it runs two to three times faster than it does in Java. I'm trying to figure out which instructions may be the most resource intensive and see if there is a better way of doing the same thing that is more efficient in Java.

This is part of a routine that reads a large resultset from the database, and for each word that is returned, it checks to see if that word can be made from the letter tiles the player has. It includes support for blank tiles, which can be used as any letter. A blank tile will be represented by an underscore character.

Basically, for each word that is returned from the database, I iterate through each of the letters of the word, and look through the players array of available letters. If I find that letter, I remove it from the players array and keep going. If I don't find the letter, the word is discarded and the next word read. Unless, I find an underscore character in the player's array, then, I'll use that for the letter, and remove it from the array. If I get to the end of the database word's array of letters and have 'found' each one, then the word is saved in a list.

I've already timed various parts of the whole function and the database query happens pretty fast. It is just the processing of this cursor that is very slow. Any suggestions would be appreciated!

if (c.moveToFirst()) {

    do { 
        boolean found = false;
        int aValue = 0;
        int letterValue = 0;

        // Word and Word's length from the database
        String sWord = c.getString(0);
        int wordLength = c.getInt(1);

        // Refresh the Tile array, underscores sorted to the front
        // sortedTiles sorted the players tiles {_,_,a,b,c}
        char[] aTiles = sortedTiles.clone();

        // Calculate the value of the word
        for (int i = 0; i < wordLength; i++) {

            // For each character in the database word
            switch (sWord.charAt(i)) {
            case 97:
                letterValue = 1;
                break;
            case 98:
                letterValue = 4;
                break;
            case 99:
                letterValue = 4;
                break;
            case 100:
                letterValue = 2;
                break;
            case 101:
                letterValue = 1;
                break;
            case 102:
                letterValue = 4;
                break;
            case 103:
                letterValue = 3;
                break;
            case 104:
                letterValue = 3;
                break;
            case 105:
                letterValue = 1;
                break;
            case 106:
                letterValue = 10;
                break;
            case 107:
                letterValue = 5;
                break;
            case 108:
                letterValue = 2;
                break;
            case 109:
                letterValue = 4;
                break;
            case 110:
                letterValue = 2;
                break;
            case 111:
                letterValue = 1;
                break;
            case 112:
                letterValue = 4;
                break;
            case 113:
                letterValue = 10;
                break;
            case 114:
                letterValue = 1;
                break;
            case 115:
                letterValue = 1;
                break;
            case 116:
                letterValue = 1;
                break;
            case 117:
                letterValue = 2;
                break;
            case 118:
                letterValue = 5;
                break;
            case 119:
                letterValue = 4;
                break;
            case 120:
                letterValue = 8;
                break;
            case 121:
                letterValue = 3;
                break;
            case 122:
                letterValue = 10;
                break;
            default:
                letterValue = 0;
                break;
            } // switch

            found = false;

            // Underscores will be sorted to the front of the array, 
            // so start from the back so that we give
            // real letters the first chance to be removed.
            for (int j = aTiles.length - 1; j > -1; j--) {
                if (aTiles[j] == sWord.charAt(i)) {
                    found = true;
                    // Increment the value of the word
                    aValue += letterValue;

                    // Blank out the player's tile so it is not reused
                    aTiles[j] = " ".charAt(0);

                    // I was removing the element from the array
                    // but I thought that might add overhead, so
                    // I switched to just blanking that letter out
                    // so that it wont be used again
                    //aTiles = removeItem(aTiles, j);

                    break;
                }

                if (aTiles[j] == cUnderscore) {
                    found = true;

                    // Blank out the player's tile so it is not reused
                    aTiles[j] = " ".charAt(0);

                    // I was removing the element from the array
                    // but I thought that might add overhead, so
                    // I switched to just blanking that letter out
                    // so that it wont be used again
                    //aTiles = removeItem(aTiles, j);
                    break;
                }

            } // for j

            // If a letter in the word could not be fill by a tile 
            // or underscore, the word doesn't qualify
            if (found == false) {
                break;
            }

        } // for i

        // If all the words letters were found, save the value and add to the list.
        if (found == true) {

            // if all the tiles were used it's worth extra points
            String temp = aTiles.toString().trim();

            if (temp.length() < 1) {
                aValue += 35;
            }

            Word word = new Word();
            word.word = sWord;
            word.length = wordLength;
            word.value = aValue;
            listOfWords.add(word);
        }

    } while (c.moveToNext());
}
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
Michael Dougan
  • 1,698
  • 1
  • 9
  • 13
  • 7
    `" ".charAt(0)` can be written simply as `' '` – Piotr Praszmo Oct 15 '11 at 23:47
  • Are you running the objective-c version on the iphone and the java version on an android? If so, are the hardware of similar speed? – blizpasta Oct 15 '11 at 23:58
  • I would convert the word string into a character array before starting your loop through each character. array access is likely faster than charAt(), even if charAt is doing the same thing, remove the extra method call from the stack. also +1 @Banthar for that little tidbit, for something coming from C code, this code is surprisingly object heavy. – gnomed Oct 16 '11 at 00:04
  • one other tip i always tell anyone using a switch or else if statement. Always put the common cases first, these values appear to be in sequence in your code. if you put the common ones first then they will short circut the path before all the other comparisons are made. at least put the vowels first. – gnomed Oct 16 '11 at 00:12
  • this really belongs on http://codereview.stackexchange.com – Moog Oct 16 '11 at 23:15
  • Banthar, thanks for that. blizpasta, both iPhone and Android tests were conducted on their respective simulators, on different computers, but should have similar performance. gnomed, both good ideas, I ended up getting rid of the switch statement and tabling the letterValues as suggested by Ted below, which saved a couple of seconds. @Merlin, thanks, I'll know for next time. I'm new here! – Michael Dougan Oct 17 '11 at 19:21

4 Answers4

8

I don't know exactly where your code is spending most of its time. You should profile for that. But I would replace your long switch statement with a table lookup:

// In the class:
private static final int[] letterValues = {
    1, 4, 4, 2, 1, // etc.
}

// In the code:

// Calculate the value of the word
for (int i = 0; i < wordLength; i++) {

    // For each character in the database word
    char ch = sWord.charAt(i);
    if (ch >= 97 && ch <= 122) {
        letterValue = letterValues[ch - 97];
    } else {
        letterValue = 0;
    }

    // the rest of your code

This is likely to be much faster than a switch statement.

EDIT: I notice that inside your j loop you are calling sWord.charAt(i) for each j value. You can speed things up a bit more by factoring that function call out of the loop. If you use my code, you can just use ch in place of sWord.charAt(i).

P.S. As a matter of style, it's nicer to code if (found) { ... instead of if (found == true) { .... Likewise use if (!found) { instead of if (found == false) {.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • converting the database word, sWord into a char array sounds good, and/or getting the charAt outside of the j loop will definitely help. That's a nice approach for getting the letterValue too. – Michael Dougan Oct 17 '11 at 18:11
  • I don't know how much converting to a char array will help, particularly if you are only calling `charAt` once for each letter position. The only way to know, though, is to try it both ways and profile it (or at least time it). – Ted Hopp Oct 17 '11 at 18:22
  • I just tried tabling the letterValues and for a 50,000 word result set, it shaved 2 seconds off the time compared to using the switch statement. And, it looks a lot cleaner in the code too, thanks! – Michael Dougan Oct 17 '11 at 19:00
1

I think the switch statement will probably be turned into a jump table by the compiler, so I don't see an issue with that.

On the other hand, you can probably use a better data structure for your player's hand. Right now, you're basically using a triply nested loop:

  1. Iterating through every word in the database
  2. Iterating through every character in the word
  3. Iterating through every character in the player's tile array

The first two cannot be avoided. On the other hand, you can use a hash table or some kind of O(N) lookup data structure for the third.

I'd probably represent the hand as an array of 27 ints. Each element represents a letter or "_", and its value is the number of tiles in the hand. When you find a matching tile, you can decrement its value. If the value is already zero, then you know the player doesn't have that tile.

But as Ted pointed out, your best bet is to use a profiler to find the most expensive calls. Then figure out how to make as few of those calls as possible.

pepsi
  • 6,785
  • 6
  • 42
  • 74
1

You tend to get answers that are guesses.

The thing to do is, on each platform, just squeeze the code until it's optimal. Then if there's any speed differential, at least you'll know each code is as fast as possible.

Profiling is what's often recommended, but here's an example of how I do it.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Thanks for the example, I'll read through it to learn how to profile properly. I did put system timers all throughout the code to try to figure out where the biggest delays were... more details below! – Michael Dougan Oct 17 '11 at 18:13
  • @Michael: Yeah, if you get good at this, you'll actually have a rare skill. The general wisdom is "measure measure" or "use a profiler", but if you ever hear (which you don't) how much speedup was achieved, it's maybe like 10%-40%. Pretty anemic. Not like the 43 *times* in that first link. – Mike Dunlavey Oct 17 '11 at 18:46
0

Thanks everyone. I was expecting to get e-mail alerts, so, I didn't realize people were already responding.

I ended up placing log printouts of system times at various places in the cursor loop to try and determine what was taking the most time. The initial moveToFirst() took the most time, but that was to be expected, and there is really nothing I could do about it. I noticed that most words processed in two or three milliseconds, which should be plenty fast enough, but then, for no apparent reason, one word would take 20 or 30 ms. I reasoned that there must be some garbage collection going on in the background, so, I sought to reduce allocations/deallocations as much as possible. In all of the loops, rather than declare the variable like for (int i = 0... to for (i = 0, moving the variable declaration to the top of the method. This helped a little, but I'd still left one line untouched. When I changed that, it made all the difference in the world.

    // Refresh the Tile array, underscores sorted to the front

    for (i = 0; i < sortedTiles.length; i++) {
        aTiles[i] = sortedTiles[i];
    }

    // Instead of doing it this way                 
    //aTiles = sortedTiles.clone();

I allocate aTiles above the cursor loop, and here, I'm just re-initializing it with the characters in the players hand. This one clone() was what was making frequent garbage collection necessary, and killing my performance.

You've all provided great suggestions, and I'll further tweak the code to eek out better performance based on your advice. Thank you very much!

Michael Dougan
  • 1,698
  • 1
  • 9
  • 13
  • You can replace that loop with this call: `System.arraycopy(sortedTiles, 0, aTiles, 0, sortedTiles.length);`. It shouldn't run a lot faster, but it's a bit cleaner. – Ted Hopp Oct 17 '11 at 19:06
  • int won't be garbage collected. Only references can be garbage collected, not primitives. – TofuBeer Oct 17 '11 at 19:27
  • OK, but it still seems that by declaring any variable inside of a code block, it will cause that variable to go in and out of scope, and I would think that behind the scenes, that would cause extra time allocating resources for that variable. Even if it is not something that would affect garbage collection. I think reusing a variable would be preferable in most cases. – Michael Dougan Oct 17 '11 at 22:30