2

Does anyone know of an efficient implementation of a memcspn function?? It should behave like strcspn but look for the span in a memory buffer and not in a null terminated string. The target compiler is visualC++ .

Thanks, Luca

starblue
  • 55,348
  • 14
  • 97
  • 151
luca
  • 369
  • 1
  • 5
  • 7

2 Answers2

2

One near-optimal implementation:

size_t memcspan(const unsigned char *buf, size_t len, const unsigned char *set, size_t n)
{
    size_t i;
    char set2[1<<CHAR_BIT] = {0};
    while (n--) set2[set[n]] = 1;
    for (i=0; i<len && !set2[buf[i]]; i++);
    return i;
}

It might be better to use a bit array instead of a byte array for set2, depending on whether arithmetic or a little bit more cache thrashing is more expensive on your machine.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 1
    One nit, that is easily repaired: You've confused `strcspn()` with `strpbrk()`. The latter returns a pointer to the first instance from `set`, the former returns the offset to the first instance. Obviously, the same underlying logic gets you either, aside from the edge case of no members of `set` in `buf`. – RBerteig Aug 30 '10 at 20:30
0

It would seem pretty difficult to write an inefficient implementation of this function, TBH - the implementation seems pretty straightforward, so I'd suggest writing this yourself if you can't find an implementation in a reasonable timeframe.

Will A
  • 24,780
  • 5
  • 50
  • 61
  • 1
    It's actually pretty easy to write an inefficient implementation - for large character sets, e.g. 32-bit `wchar_t`, the inefficient implementation is the only practical implementation and it's `O(nm)`. – R.. GitHub STOP HELPING ICE Aug 30 '10 at 17:36
  • @R.: Well, an `O((m + n) log m)` implementation isn't too hard (sort `needle` then binary search within it rather than linear search). And a method similar to yours but using a hash table rather than a simple array would work for 32-bit `wchar_t` and have a similar running time in the average case. – caf Aug 31 '10 at 03:39
  • Sorting the "needle" (sorta a misnomer - actually the set of characters) is `O(m)` in space, since you need working space for a copy of it. It's conceivable that this could make the approach infeasible. At least you'd need a fallback case for huge sets. – R.. GitHub STOP HELPING ICE Aug 31 '10 at 04:11
  • Of course, a saner design would be to just require the `set` argument to already be sorted... if you can make this requirement. – R.. GitHub STOP HELPING ICE Aug 31 '10 at 04:12
  • @all: Very good points - I hadn't considered the search-for-chars being anything more than a few chars - so lack of binary search does make the naive approach much more inefficient. – Will A Aug 31 '10 at 06:37