0

Is there an implementation in C in order to search multiple strings in an array?

Given an array of strings strings[] = {"string1", "string2" ,"string3"}, how can I search if they exist in an array of strings in one pass? I would like to avoid searching the array of strings for each word in the strings[] array.

MarredCheese
  • 17,541
  • 8
  • 92
  • 91
  • What have you tried? Can you give an example API which would serve as a start point for others? – KamilCuk Nov 25 '18 at 22:47
  • You probably want a regex package that supports POSIX Extended Regular Expressions or better. That might be [PCRE](https://pcre.org/) (Perl-Compatible Regular Expressions), or just the POSIX [`regexec()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/regexec.html) set of functions. The `grep -F` mode can search (fast) for multiple words at the same time too, without requiring the user to explicitly use regex notation. – Jonathan Leffler Nov 25 '18 at 22:50
  • If this is a plain array of strings, you don't have much choice. If you are allowed to organize your data into certain structures (a hashtable or a trie), then you could get this information faster. A hashtable would give you `O(1)` response on average, i.e. you would basically do three `O(1)` lookups. If you are actually doing something like searching a sentence in a large text, then you would have to use more complex structures, probably something based on a trie. – vgru Nov 25 '18 at 23:01

1 Answers1

0

In the end, you need to compare every element in the final array with each of the search strings patterns. But there may be more efficient ways to do that and ways to avoid some comparisons if you only care to find whether the patterns exist at least once. For example:

string patterns[] = {"string1", "string2", "string3"};
int hasFoundElement[] = {0, 0, 0};
int numElementsFound = 0;

for (int i = 0; i < arrayLength; i++)
{
  for (int j = 0; j < patternsLength; j++)
  {
    if (!hasFoundElement[j] &&
        strcmp(patterns[j], array[i]) == 0)
    {
      hasFoundElement[j] = 1;
      numElementsFound++;
      if (numElementsFound == patternsLength)
      {
         return true;
      }
    }
}
return false;
Paul
  • 370
  • 1
  • 6
  • This will be less efficient than simply iterating three times, breaking each time as soon as you find the item. – vgru Nov 25 '18 at 22:53
  • @Groo Yes, in some cases. But if you have an array that's so long it's mmaped from the disk and a relatively short pattern array, iterating over the long array only once may be the better option. – Paul Nov 25 '18 at 22:55
  • breaking as soon as you are done will cut the average time in half, which will certainly have impact if you're reading from the disk. – vgru Nov 25 '18 at 23:00
  • 1
    @Groo Yes, good point. I'll update the answer to include that optimization. – Paul Nov 25 '18 at 23:05