Linear Search Algorithm Optimization

Question

I just finished a homework problem for Computer Science 1 (yes, it's homework, but hear me out!). Now, the assignment is 100% complete and working, so I don't need help on it. My question involves the efficiency of an algorithm I'm using (we aren't graded on algorithmic efficiency yet, I'm just really curious).

The function I'm about to present currently uses a modified version of the linear search algorithm (that I came up with, all by myself!) in order to check how many numbers on a given lottery ticket match the winning numbers, assuming that both the numbers on the ticket and the numbers drawn are in ascending order. I was wondering, is there any way to make this algorithm more efficient?

/*
 * Function: ticketCheck
 *
 * @param struct ticket
 * @param array winningNums[6]
 *
 * Takes in a ticket, counts how many numbers
 * in the ticket match, and returns the number
 * of matches.
 *
 * Uses a modified linear search algorithm,
 * in which the index of the successor to the
 * last matched number is used as the index of
 * the first number tested for the next ticket value.
 *
 * @return int numMatches
 */
int ticketCheck( struct ticket ticket, int winningNums[6] )
{
    int numMatches = 0;
    int offset = 0;
    int i;
    int j;

    for( i = 0; i < 6; i++ )
    {
        for( j = 0 + offset; j < 6; j++ )
        {
            if( ticket.ticketNum[i] == winningNums[j] )
            {
                numMatches++;
                offset = j + 1;
                break;
            }
            if( ticket.ticketNum[i] < winningNums[j] )
            {
                i++;
                j--;
                continue;
            }
        }
    }

    return numMatches;
}

By make more efficient, does that mean change the search to some other type of search, or do you want to still use a linear search? — alternative, Aug 29 '10 at 17:16
It is very surprising. If two numbers match (first `if`), you keep looking for the same number `i` (`break;`). Do you expect either list to have duplicates? That's not how lottery works in my state. The second `if` is hard to understand, too. It seems to me that the `j--` is there to undo the `j++` that is going to be caused by `continue;`, but frankly, I'd prefer the sight of a clean `goto` than this kind of trick. — Pascal Cuoq, Aug 29 '10 at 17:17
in the inner loop you can stop when you see a number greater than the ticketnum[i]. as you know linear search like this takes O(n*n) time, where n is the number of lottery numbers. since O(C*n*n) is the same for any constant C, you will not change the complexity by stopping when you see the bigger number, but you will reduce the constant C, thus making it run faster. also i do not like the style if you do not mind my saying so. i would not change the loop indexes inside the loop. it is not readable. — akonsu, Aug 29 '10 at 17:17
@Andrew: There is a special homework tag, I added it for you. It is good manners to use that tag in addition to mention homework in the text. — Albin Sunnanbo, Aug 29 '10 at 17:20
@Albin Okay, thank you! I didn't intentionally omit it, I just didn't know it existed. I think I tried to make it fairly clear in the post body that this was a homework assignment, but I'll use the tag from now on if ever needed. — Andy, Aug 29 '10 at 17:23
for readability you may want to replace the struct ticket with `typedef struct { ... } ticket_t;` and then `int ticketCheck( struct ticket ticket, int winningNums[6] )` with `int ticketCheck( ticket_t ticket, int winningNums[6] )`. This is sort of how its done in linux code :) — ldog, Aug 29 '10 at 17:23
@Akonsu How do you recommend I go about retrying the inner loop without decrementing the index? Does C have the equivalent of a `retry` command? — Andy, Aug 29 '10 at 17:24
Andrew, i was not recommending to retry the inner loop. i was trying to say that you could find a better way to do the search without having to retry the loop. — akonsu, Aug 29 '10 at 20:52

Philip Potter · Accepted Answer · 2010-08-29T17:27:31.853

16

It's more or less there, but not quite. In most situations, it's O(n), but it's O(n^2) if every ticketNum is greater than every winningNum. (This is because the inner j loop doesn't break when j==6 like it should, but runs the next i iteration instead.)

You want your algorithm to increment either i or j at each step, and to terminate when i==6 or j==6. [Your algorithm almost satisfies this, as stated above.] As a result, you only need one loop:

for (i=0,j=0; i<6 && j<6; /* no increment step here */) {
    if (ticketNum[i] == winningNum[j]) {
        numMatches++;
        i++;
        j++;
    }
    else if (ticketNum[i] < winningNum[j]) {
        /* ticketNum[i] won't match any winningNum, discard it */
        i++;
    }
    else { /* ticketNum[i] > winningNum[j] */
        /* discard winningNum[j] similarly */
        j++;
    }
}

Clearly this is O(n); at each stage, it either increments i or j, so the most steps it can do is 2*n-1. This has almost the same behaviour as your algorithm, but is easier to follow and easier to see that it's correct.

edited Aug 29 '10 at 17:27

answered Aug 29 '10 at 17:21

Philip Potter

8,975
2
37
47

8

Your answer had better comments, so I removed mine. – Pascal Cuoq Aug 29 '10 at 17:28
3

There is an anecdote about two programmers — I don't remember which but it's probably two among Pike, Ritchie, Thomson and Kernighan — independently writing the same function to the comma. I always thought it was not so extraordinary. – Pascal Cuoq Aug 29 '10 at 17:31
1

@Pascal: I quite agree. If you are thinking the same algorithm, you'll write similar code. If you are on the same team, you'll use the same style. Same algorithm + same style = same code. – Philip Potter Aug 29 '10 at 17:34
1

+1 That's shows a clear understanding of the problem instead of concentrating on the given solution. – Praveen S Aug 29 '10 at 18:12
You could remove the reptition of the `i++;` and `j++;` by removing them from the first `if()`, then removing the `else`s and changing the other `if`s to `<=` and `>=`. – caf Aug 30 '10 at 00:47

score 7 · Answer 2 · edited May 23 '17 at 12:26

7

You're basically looking for the size of the intersection of two sets. Given that most lottos use around 50 balls (or so), you could store the numbers as bits that are set in an unsigned long long. Finding the common numbers is then a simple matter of ANDing the two together: commonNums = TicketNums & winningNums;.

Finding the size of the intersection is a matter of counting the one bits in the resulting number, a subject that's been covered previously (though in this case, you'd use 64-bit numbers, or a pair of 32-bit numbers, instead of a single 32-bit number).

edited May 23 '17 at 12:26

Community

1
1

answered Aug 29 '10 at 17:22

Jerry Coffin

476,176
80
629
1,111

1

Jeffry, this is smart but i personally do not like this style of programming. in my opinion code must be readable and operate on arrays if arrays are needed. using integers as arrays is not a good style i think. but it is just me. your solution would be great in an embedded system with limited memory and a slow CPU though. :) edit: but your response actually gives the general approach. iterate over the both sequences in parallel and emit 1 for a match and a zero for a mismatch. this i think is how the AND is implemented in the hardware anyway. – akonsu Aug 29 '10 at 17:28
@akonsu it's basically the same algo I proposed but using a bitfield instead of an array. If the sets are big it is a proven and widely used technique (`select` syscall on Unix for example). – Patrick Schlüter Aug 29 '10 at 17:32
1

@Akonsu: no problem about the name (as the old line goes, just don't call me late for dinner). Actually, in most hardware the AND is going to be executed bit-parallel (i.e., the CPU will have 32 or 64 (or whatever the word size size is) separate 2-input AND gates, all producing outputs at the same time). As for "arrays if arrays are needed", I tend to agree -- but this is a case where (IMO) arrays are *not* needed (nor probably even preferred). – Jerry Coffin Aug 29 '10 at 17:44
@caf: not entirely parallel anyway, that's entirely true. OTOH, I'm pretty sure you can execute parts of it in parallel, so the ten (or so) total operations can happen in something like four clock cycles, with ~3 operations in parallel for each of the first three clocks, then a single operation to combine them all at the end. – Jerry Coffin Aug 30 '10 at 04:47

score 2 · Answer 3 · answered Aug 29 '10 at 17:24

2

Yes, there is something faster, but probably using more memory. Make an array full of 0 in the size of the possible numbers, put a 1 on every drawn number. For every ticket number add the value at the index of that number.

 int NumsArray[MAX_NUMBER+1];
 memset(NumsArray, 0, sizeof NumsArray);

 for( i = 0; i < 6; i++ )
   NumsArray[winningNums[i]] = 1;

 for( i = 0; i < 6; i++ )
   numMatches += NumsArray[ticket.ticketNum[i]];

12 loop rounds instead of up to 36 The surrounding code left as an exercise.

EDIT: It also has the advantage of not needing to sort both set of values.

answered Aug 29 '10 at 17:24

Patrick Schlüter

11,394
1
43
48

1

This is the same speed as my solution (~2n loop iterations) but uses more memory. It's faster in the unsorted case, however, because my answer depends on sorted data while yours doesn't. – Philip Potter Aug 29 '10 at 17:41
This method is also easy to extend to correctly handle duplicate numbers in input by using NumsArray[winningNums[i]] +=1 instead. – Christoffer Aug 29 '10 at 17:46
Yes, mine has also the advantage if you want to check more than one loterie ticket against the winning numbers, to need only the first loop once. – Patrick Schlüter Aug 29 '10 at 17:49

score 0 · Answer 4 · answered Aug 29 '10 at 17:22

This is really only a minor change on a scale like this, but if the second loop reaches a number bigger than the current ticket number, it is already allowed to brake. Furthermore, if your seconds traverses numbers lower than your ticket number, it may update the offset even if no match is found within that iteration.

PS: Not to forget, general results on efficiency make more sense, if we take the number of balls or the size of the ticket to be variable. Otherwise it is too much dependent of the machine.

score 0 · Answer 5 · answered Aug 29 '10 at 18:35

If instead of comparing the arrays of lottery numbers you were to create two bit arrays of flags -- each flag being set if it's index is in that array -- then you could perform a bitwise and on the two bit arrays (the lottery ticket and the winning number sets) and produce another bit array whose bits were flags for matching numbers only. Then count the bits set.

For many lotteries 64 bits would be enough, so a uint64_t should be big enough to cover this. Also, some architectures have instructions to count the bits set in a register, which some compilers might be able to recognize and optimize for.

The efficiency of this algorithm is based both on the range of lottery numbers (M) and the number of lottery numbers per ticket (N). The setting if the flags is O(N), while the and-ing of the two bit arrays and counting of the bits could be O(M), depending on if your M (lotto number range) is larger than the size that the target cpu can preform these operations on directly. Most likely, though, M will be small and its impact will likely be less than that of N on the performance.

Linear Search Algorithm Optimization

5 Answers5