Is there an efficient way to find all occurrences (including overlapping) of non const char *str2
in char *str1
and outputting the numbers position of matches in str1 in C (not in C++ as it differs)?
Asked
Active
Viewed 1.3k times
1

NGix
- 2,542
- 8
- 28
- 39
-
3Do you need overlapping or non-overlapping occurrences? Are you aware of `strstr()`? What have you tried? – Jonathan Leffler Nov 20 '12 at 21:34
-
It sounds like you need strstr(). – imreal Nov 20 '12 at 21:35
-
I need including overlapping and I'm not aware of strstr() ;) – NGix Nov 20 '12 at 21:38
-
How long is the string we're talking about? If it's really long, you might consider some pre-processing for fast operation afterwards. Same goes if the string changes infrequently and is queried often. – cxxl Nov 20 '12 at 21:38
-
Here's a reference for [`strstr()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/strstr.html) — yes, that's POSIX, but for this function, it matches the C standard exactly. You need to learn which functions are in the standard C library and how to use them. Then questions like this become easy. – Jonathan Leffler Nov 20 '12 at 21:41
-
I saw this reference, but it finds only 1 occurrence and I'm not sure if it's faster in some algorithm then for example this method [link](http://en.wikipedia.org/wiki/Suffix_array) – NGix Nov 20 '12 at 21:44
-
Try it and see. It depends on what you're doing, and how often you're doing it, but I'd lay odds that for many purposes with a single scan through the data, the cost penalty of building a Suffix Array outweighs any performance benefit in the (one time) search phase. If you're going to be repeatedly analyzing the data, possibly with different strings, then pre-processing may provide a big win. But for most naïve applications, they're unlikely to help. – Jonathan Leffler Nov 20 '12 at 21:49
-
2If `str2` is constant and you want to find it in several `str1`, a string search algorithm like Knuth-Morris-Pratt or Boyer-Moore can be a win over `strstr`, also for pathological cases if you're only scanning one `str1`. – Daniel Fischer Nov 20 '12 at 21:52
2 Answers
1
Your function will use strstr()
in a while
loop to find the first match of str2
in str1
. You can then print the offset of that match. You'll continue the search at the first character after the match. You'll stop the loop when strstr()
no longer finds a match (signalled by strstr()
returning NULL).
If you needed non-overlapping, you'd want to know the length of str2
and you'd start the next search at the matched character plus the length of str2
.

Jonathan Leffler
- 730,956
- 141
- 904
- 1,278
-
That's okay, but I'm looking for more efficient algorithm e.g. suffix array, because `str1` can be really long and program execution time is limited – NGix Nov 20 '12 at 21:57
-
Try `strstr()`; it can be hard to beat. At least on Mac OS X, I tried BM and KMP in C and didn't get close to `strstr()` in performance. I was both surprised and disappointed. Are you doing 'bioinformatics' searching on ACGTAGGTCA type strings? – Jonathan Leffler Nov 20 '12 at 21:59
0
Using strstr()
inside a loop:
int get_substr_count(const char * haystack, const char *needle)
{
int count = 0;
const char *tmp = haystack;
while( tmp = strstr( tmp, needle)){
printf( "Position: %d\n", (int)(tmp-haystack));
++count;
}
return count;
}

Vyktor
- 20,559
- 6
- 64
- 96
-
That counts the matches; it's not the same as printing the positions of the matches. – Jonathan Leffler Nov 20 '12 at 21:50
-
2you should increment tmp inside while loop, else this code goes to infinite loop because it keeps searching from same index. – Jignesh Oct 20 '14 at 12:55
-
1You should increment tmp with strlen(needle), because after the first occurrence of needle, strstr will return the tmp reference and it goes to infinite loop. – Andrei Minca Mar 02 '20 at 22:50