0

Recently did an online programming assessment, where the question was essentially "search a number for specified integers and increment a count".

My solution was to convert the given number to a string, and then search through the string using a for loop and a switch statement. So basically:

for (int i = 0; i < str.length(); i++;)
{
    switch(str[i]) {
      case  '1':
       count++;
       break;
      case  '4': 
       count++;
       break;
      case  '8': 
       count++;
       break;
       // etc
    }
}

What I'm wondering is, is this an inefficient way of accomplishing this? I believe the time complexity is O(n), but I could be wrong. If it is inefficient, what is a better solution?

**Edited for clarification, added cases '4' and '8'

JebLab
  • 28
  • 7
  • I would probably use division and modulo operations to get the digits one by one, and then an `if` if there's only one or two digits that needs special handling and a `switch` if more. – Some programmer dude Mar 06 '22 at 00:30
  • obviously searching char-by-char is in no way efficient. Modern string libraries use SIMD to speed up this. See [Why does glibc's strlen need to be so complicated to run quickly?](https://stackoverflow.com/q/57650895/995714) – phuclv Mar 07 '22 at 03:15
  • 1
    @phuclv efficient relative to what? Is there (or can there be) a SIMD implementation of the posted code that would be faster? – Jeremy Friesner Mar 07 '22 at 03:40
  • 2
    @JeremyFriesner yes, of course, for example [How to count character occurrences using SIMD](https://stackoverflow.com/q/54541129/995714). \@JebLab you're also missing `break`s – phuclv Mar 07 '22 at 03:51
  • @phuclv that code seems to only support testing for a single character though; the code here tests for three (or more?) different characters at once. – Jeremy Friesner Mar 07 '22 at 03:57
  • 1
    @JebLab I assume you meant to include `break` statements at the end of each case in your example code? Without them, each `1` in the string will increment `count` by 3, etc, which seems unlikely to be intentional – Jeremy Friesner Mar 07 '22 at 04:01
  • I wouldn't describe this as 'searching' at all, more 'scanning'. Seems fine to me. Hand-written compiler scanners are written this way. – user207421 Mar 07 '22 at 04:29
  • @JeremyFriesner it's just an example, you can always write a SIMD solution that matches multiple values – phuclv Mar 07 '22 at 12:41
  • @JeremyFriesner Added break statements for clarity, thanks! – JebLab Mar 08 '22 at 04:08

3 Answers3

2

switch is generally not inefficient; internally it is typically implemented either as a jump-table to the various options, or as a binary search, or as a series of if-then statements, depending on what the compiler/optimizer thinks will execute fastest.

However, you could gain some efficiency in this case by avoiding the switch/case block altogether and using a lookup-table instead, like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int, char **)
{  
   const int BUFSIZE=1024*1024*10;  // 10MB
   
   char * str = new char[BUFSIZE];
   for (int i=0; i<BUFSIZE; i++) str[i] = (rand()%127)+1;
   str[BUFSIZE-1] = '\0';
      
   unsigned char mask[256];
   memset(mask, 0, sizeof(mask));
   mask['1'] = 1;
   mask['4'] = 1;
   mask['8'] = 1;

   for (int i=0; i<BUFSIZE; i++) count += mask[(unsigned)str[i]];

   printf("count=%i\n", count);
   return 0;
}

On my computer, doing it this way executed a little more than twice as quickly than the original (switch-based) approach did.

Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
1

Is searching a string using a switch inside a for loop inefficient?

In general, no.

What I'm wondering is, is this an inefficient way of accomplishing this?

Firstly, the switch is unnecessarily complex. An if statement is simpler:

if (str[i] == '1')
    count++;

What about when there are multiple numbers to be matched?

Then a switch is probably more readable, but your example is broken since it counts 1 and 4 more than once. Fixed code:

switch(str[i]) {
  case  '1':
  case  '4': 
  case  '8': 
     count++;
}

Secondly, this is probably not optimal. It will probably be a bit faster to repeatedly use remainder operator to get the least significant digit and divide by 10, and compare the digit. This is essentially part of what the string conversion function is doing internally.

To compare asymptotic complexity, this has same time complexity as your solution, O(log N) where N is the size of the integer. The size complexity is constant while your solution requires O(log N) for the string storage. Note that the comparison of asymptotic complexity is mostly irrelevant if you use this with primitive integers due to their limited range. This matters for arbitrary precision arithmetic.

Another potential optimisation is to not use a branch in the loop, but instead to count the frequency of all integers, and in the end loop the frequencies and switch only in that loop. Benchmarking will tell if this is faster.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • What about when there are multiple numbers to be matched? For example, if you wanted to check if str[i] is equal to 1, 2, 3, or 4? Wouldn't the switch statement be better in that case rather than nested ifs? My question might have originally been a little unclear, hopefully the edits made cleared that up. – JebLab Mar 07 '22 at 03:05
  • In general a switch/case block will be more efficient (or at least no less efficient, and also more readable/maintainable) than a series of if/then tests. – Jeremy Friesner Mar 07 '22 at 03:38
1

Efficiency is generally not an issue for an operation you do once, on a small input. It's very likely that for a single integer, most of the CPU time will be spent loading the code (since it won't be in the CPU cache). And even then it's going to be less than a microsecond.

Efficiency would matter if you have a std::vector<int> with a million values. In this case, the goal is to calculate the answers as fast as your CPU can load new integers from memory - and with a vector, that's pretty fast.

As eeroika mentioned, yo do unnecessary work. You first create a string, which means you decide for each str[i] what digit it is, and then store that digit. But you don't care about most digits, and for the digits that you do care about there's no need to store them either. You just need to count whether you'd store them.

This matters because storing those digits requires writing them to memory (well, cache). On the other hand, a CPU can easily keep track of a few counters.

MSalters
  • 173,980
  • 10
  • 155
  • 350