How do I do the in-place equivalent of strstr()
for a counted string (i.e. not null-terminated) in C?
Asked
Active
Viewed 2.2k times
34

user541686
- 205,094
- 128
- 528
- 886
-
3You'll have to write your own version. – Seth Carnegie Dec 21 '11 at 03:13
-
Which string isn't null-terminated? The string being searched, or the sub-string? – Dec 21 '11 at 03:15
-
@TimCooper: The one being searched (haystack). – user541686 Dec 21 '11 at 03:16
-
@SethCarnegie: It's not exactly trivial... I could try KMP or something if I really need to, but I'd rather avoid it if I can. – user541686 Dec 21 '11 at 03:22
-
3You can steal the implementation of `strnstr()` from BSD. But look out for this bug: http://www.mikeash.com/pyblog/dont-use-strnstr.html – Matt K Dec 21 '11 at 03:22
-
@mkb: ... Thanks for the link, I'll take a look but that link makes me a bit nervous haha. – user541686 Dec 21 '11 at 03:26
-
Have you looked at Boyer-moore string searching? It works quite fine without terminating `\0`s and is O(4*n). – moshbear Dec 21 '11 at 05:30
-
@moshbear: I had heard about it once but I know little about it and I'd totally forgotten it -- I'll look at that, thanks! – user541686 Dec 21 '11 at 05:38
-
Yes, Boyer-Moore is very good. It's a bit more complicated than KMP, so if you don't find a ready-made implementation, it may not be worthwhile to cook one up yourself. If performance is really important, I would however recommend BM, especially if needles are long. (Best case, BM looks only at every `m`th character.) – Daniel Fischer Dec 21 '11 at 05:50
-
3glibc has [memmem](http://stackoverflow.com/questions/24263582/is-there-a-particular-reason-for-memmem-being-a-gnu-extension) (needle and haystack both counted), I'm sure there will be a public domain implementation out there too. – M.M May 05 '15 at 13:06
4 Answers
8
See if the function below works for you. I haven't tested it thoroughly, so I would suggest you do so.
char *sstrstr(char *haystack, char *needle, size_t length)
{
size_t needle_length = strlen(needle);
size_t i;
for (i = 0; i < length; i++) {
if (i + needle_length > length) {
return NULL;
}
if (strncmp(&haystack[i], needle, needle_length) == 0) {
return &haystack[i];
}
}
return NULL;
}
-
That's actually similar to what I'm currently using, but it's O(mn), whereas (I'm assuming) `strstr` is O(m + n). So I'm looking for something that's not ridiculously slow like my version. :-) But +1 anyway, since the idea works. – user541686 Dec 21 '11 at 03:24
-
@Mehrdad: Might also be worth it to take a peek at this implementation: http://src.gnu-darwin.org/src/lib/libc/string/strnstr.c.html – Dec 21 '11 at 03:26
-
Wow, I guess I was wrong then... so `strstr` is typically defined to be an O(mn) operation?? Thanks for pointing that out... then I'll probably accept this in a bit, since it's the exact substitute for the question. – user541686 Dec 21 '11 at 03:27
-
@Mehrdad: I cleaned up my solution a bit, if you would like to take a look at it again. – Dec 21 '11 at 03:32
-
@Mehrdad C does not specify/define the O() of `strstr()`. – chux - Reinstate Monica May 05 '15 at 16:40
-
You could avoid the per-loop check for `if (i + needle_length > length)` by just reducing `length` appropriately so the `for` loop condition is correct. `if (needle_length > length) return NULL;`, then `length -= needle_length;`, and the `for` loop becomes `for (i = 0; i <= length; i++)` (changing `<` to `<=` to ensure you check the last possible position). The first `if` check is only necessary because you use `size_t`; using `ssize_t` would remove the need for that check. Good compiler might be smart enough to eliminate `if` for you, but why write complex code and hope if you don't need to. – ShadowRanger Aug 31 '16 at 17:16
-
It would also be possible to optimize an implementation like this by: 1) Using `memcmp` to avoid `NUL` checking of `strncmp`. 2) Using `memchr` to find the first character of the needle in the haystack, so you're not making a `strncmp`/`memcmp` function call for every character (likely reducing the number of calls by a factor of 50x-100x). Or skip memchr and just test the first character manually before calling `strncmp`/`memcmp`. – ShadowRanger Aug 31 '16 at 17:23
8
If you're afraid of O(m*n) behaviour - basically, you needn't, such cases don't occur naturally - here's a KMP implementation I had lying around which I've modified to take the length of the haystack. Also a wrapper. If you want to do repeated searches, write your own and reuse the borders
array.
No guarantees for bug-freeness, but it seems to still work.
int *kmp_borders(char *needle, size_t nlen){
if (!needle) return NULL;
int i, j, *borders = malloc((nlen+1)*sizeof(*borders));
if (!borders) return NULL;
i = 0;
j = -1;
borders[i] = j;
while((size_t)i < nlen){
while(j >= 0 && needle[i] != needle[j]){
j = borders[j];
}
++i;
++j;
borders[i] = j;
}
return borders;
}
char *kmp_search(char *haystack, size_t haylen, char *needle, size_t nlen, int *borders){
size_t max_index = haylen-nlen, i = 0, j = 0;
while(i <= max_index){
while(j < nlen && *haystack && needle[j] == *haystack){
++j;
++haystack;
}
if (j == nlen){
return haystack-nlen;
}
if (!(*haystack)){
return NULL;
}
if (j == 0){
++haystack;
++i;
} else {
do{
i += j - (size_t)borders[j];
j = borders[j];
}while(j > 0 && needle[j] != *haystack);
}
}
return NULL;
}
char *sstrnstr(char *haystack, char *needle, size_t haylen){
if (!haystack || !needle){
return NULL;
}
size_t nlen = strlen(needle);
if (haylen < nlen){
return NULL;
}
int *borders = kmp_borders(needle, nlen);
if (!borders){
return NULL;
}
char *match = kmp_search(haystack, haylen, needle, nlen, borders);
free(borders);
return match;
}

Daniel Fischer
- 181,706
- 17
- 308
- 431
5
I just came across this and I'd like to share my implementation. It think it quite fast a I don't have any subcalls.
It returns the index in the haystack where the needle is found or -1 if it was not found.
/* binary search in memory */
int memsearch(const char *hay, int haysize, const char *needle, int needlesize) {
int haypos, needlepos;
haysize -= needlesize;
for (haypos = 0; haypos <= haysize; haypos++) {
for (needlepos = 0; needlepos < needlesize; needlepos++) {
if (hay[haypos + needlepos] != needle[needlepos]) {
// Next character in haystack.
break;
}
}
if (needlepos == needlesize) {
return haypos;
}
}
return -1;
}

Alex
- 51
- 1
- 2
0
I used this method
int memsearch(char* dataset, int datasetLength, char* target, int targetLen){
for(int i = 0; i < datasetLength; i++){
if(dataset[i] == target[0]){
int found = 1;
for(int j = 0; j < targetLen; j++){
int k = i + j;
if(k >= datasetLength || target[j] != dataset[k]){
found = 0;
break;
}
}
if(found) return i;
}
}
return -1;
}

Jeremiah
- 99
- 1
- 9