In C or C++, you can just use malloc
to get a chunk of 100,000 bytes. Fill it with your data. To find a needle in a haystack, you can use the following code:
void *mem_mem(void *haystack, int haystack_len, void *needle, int needle_len)
{
const char *begin;
const char *const last_possible
= (const char *) haystack + haystack_len - needle_len;
if (needle_len == 0)
return (void *) &((const char *) haystack)[needle_len - 1];
for (begin = (const char *) haystack; begin <= last_possible; ++begin)
if (begin[0] == ((const char *) needle)[0] &&
!memcmp ((const void *) &begin[1],
(const void *) ((const char *) needle + 1),
needle_len - 1))
return (void *) begin;
return NULL;
}
On any reasonably modern platform, this will find any substring in 100,000 bytes in a tiny fraction of a second. You can modify it to use char *
types trivially. If you do multiple searches in the same haystack, try to only compute the haystack length once. Don't call strlen
when you don't need to.
This will be horribly sub-optimal if your haystack contains many repetitions of the first character or characters of your needle. For example, searching for "ab" in "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaqaaaa.." (or worse, "abc" in "abababababababab...abc...") will be slow. But you didn't give nearly enough details for us to determine the optimum implementation.
It's entirely possible that the point of the question is to write the algorithm with the best possible worst case performance. If so, this is probably not the "right" answer. One can imagine a haystack of all a's followed by a single b at the end and a needle consisting of all a's followed by a single b at the end. In that case, this algorithm might need a very long time.