Efficient memcspn

Question

Does anyone know of an efficient implementation of a memcspn function?? It should behave like strcspn but look for the span in a memory buffer and not in a null terminated string. The target compiler is visualC++ .

Thanks, Luca

R.. GitHub STOP HELPING ICE · Accepted Answer · 2010-08-30T22:02:51.973

2

One near-optimal implementation:

size_t memcspan(const unsigned char *buf, size_t len, const unsigned char *set, size_t n)
{
    size_t i;
    char set2[1<<CHAR_BIT] = {0};
    while (n--) set2[set[n]] = 1;
    for (i=0; i<len && !set2[buf[i]]; i++);
    return i;
}

It might be better to use a bit array instead of a byte array for set2, depending on whether arithmetic or a little bit more cache thrashing is more expensive on your machine.

edited Aug 30 '10 at 22:02

answered Aug 30 '10 at 17:44

R.. GitHub STOP HELPING ICE

208,859
35
376
711

1

One nit, that is easily repaired: You've confused `strcspn()` with `strpbrk()`. The latter returns a pointer to the first instance from `set`, the former returns the offset to the first instance. Obviously, the same underlying logic gets you either, aside from the edge case of no members of `set` in `buf`. – RBerteig Aug 30 '10 at 20:30

score 0 · Answer 2 · answered Aug 30 '10 at 17:06

0

It would seem pretty difficult to write an inefficient implementation of this function, TBH - the implementation seems pretty straightforward, so I'd suggest writing this yourself if you can't find an implementation in a reasonable timeframe.

answered Aug 30 '10 at 17:06

Will A

24,780
5
50
61

1

It's actually pretty easy to write an inefficient implementation - for large character sets, e.g. 32-bit `wchar_t`, the inefficient implementation is the only practical implementation and it's `O(nm)`. – R.. GitHub STOP HELPING ICE Aug 30 '10 at 17:36
@R.: Well, an `O((m + n) log m)` implementation isn't too hard (sort `needle` then binary search within it rather than linear search). And a method similar to yours but using a hash table rather than a simple array would work for 32-bit `wchar_t` and have a similar running time in the average case. – caf Aug 31 '10 at 03:39
Sorting the "needle" (sorta a misnomer - actually the set of characters) is `O(m)` in space, since you need working space for a copy of it. It's conceivable that this could make the approach infeasible. At least you'd need a fallback case for huge sets. – R.. GitHub STOP HELPING ICE Aug 31 '10 at 04:11
Of course, a saner design would be to just require the `set` argument to already be sorted... if you can make this requirement. – R.. GitHub STOP HELPING ICE Aug 31 '10 at 04:12
@all: Very good points - I hadn't considered the search-for-chars being anything more than a few chars - so lack of binary search does make the naive approach much more inefficient. – Will A Aug 31 '10 at 06:37

Efficient memcspn

2 Answers2

Linked