0

how can I get the first 3 digits of a given unsigned long long without converting it to string and there is now the constant length of the number.

without using the naive approach of dividing with / 10.

like this

int first_3_digits(unsigned long long number)
{
    unsigned long long n = number;
    while(n >= 1000){ 
        n = n/10;
    }

    return n;
}

I had an interview and they said that this solution wasn't good enough the interviewer said the solution resembles a binary search. I know how binary search work but I don't know how to connect it to this problem.

pmg
  • 106,608
  • 13
  • 126
  • 198
Dolev Dublon
  • 643
  • 1
  • 6
  • 16
  • 1
    I think the interviewer meant similar to binary search, because it allows to dismiss large parts of the input quickly to get to the right answer. Dividing by 10 means going through each digit, he suggests there is a way to dismiss several digits in one iteration. – ShadowMitia May 01 '22 at 13:57
  • yes I understand is also but I don't know how to implement that, how do i do this without counting the length of the unsigned long long – Dolev Dublon May 01 '22 at 13:59
  • 2
    You might do a binary "search" for the number of digits in range 0..16. (without verifying I assume to have 19 digits as maximum for `unsigned long long`) that you have to chop off. Start chopping 8 digits. If you have more than 3 left, continue with chopped number. If you have less, continue with previous number. Then chop 4 digits... – Gerhardh May 01 '22 at 14:00
  • 1
    Are you allowed to use `log10`? – zwol May 01 '22 at 14:19
  • 2
    @zwol I thought about that, but isn't `log10` limited to floating-point? If so, you'd run into issues with the range. – beaker May 01 '22 at 14:23
  • 1
    @zwol Ah, but it wouldn't matter because you don't care about the less-significant digits, only the upper. I think that could work! – beaker May 01 '22 at 14:30
  • 2
    @DolevDublon The base-10 logarithm gives you the exponent of 10 required to give you a number, so it is the number of digits in a number represented in base 10 (with some rounding). – beaker May 01 '22 at 14:33
  • 1
    What's the source of the "given \*\*\*unsigned long long\*\*\*"? If it's a random number forget binary search, start with `if (n >= 100000000000000000000ULL) return n / 100000000000000000ULL; if (n >= 10000000000000000000ULL) return n / 10000000000000000ULL; if (n >= 1000000000000000000ULL) return n / 1000000000000000ULL; ...` – pmg May 01 '22 at 16:02
  • 1
    Using a binary search is unlikely to be more performant than [using hardware](https://en.wikipedia.org/wiki/Find_first_set). – Neil May 01 '22 at 16:13
  • Then how would you do it with hardware? – Dolev Dublon May 01 '22 at 16:18
  • I'd imagine you would have to take 2^10 after and then round, not sure. It's probably a micro-optimization anyways, all these methods work in `O(1)`. I wouldn't be surprised if the OP's code beat all of the methods. – Neil May 01 '22 at 16:34
  • 1
    @Neil I tried such an approach and it turns out that this is not much faster than the solution of Gerhardh. The fastest solution are bound by the cost of *branch misprediction* (and the one of the indirect jump for the proposed one). – Jérôme Richard May 01 '22 at 19:41

6 Answers6

3

Finding the "best" solution often depends on the criteria you define to matter. If you want smallest or simplest solution, your approach is not bad.

If you want fasted solution (and got the hint about "binary search" from the interviewer), then you might try something like this (not tested):

int first_3_digits(unsigned long long number)
{
    unsigned long long n = number;
    unsigned long long chopped;

    // n has up to 20 digits. We need to chop up to 17 digits.

    // Chop 8 digits
    chopped = n / 100000000ull;
    if (chopped >= 100)
    {
        n = chopped;
    }
    // chopped has up to 12 digits,
    // If we use old n we have up to 11 digits
    // 9 more to go...

    // Chop 4 digits
    chopped = n / 10000ull;
    if (chopped >= 100)
    {
        n = chopped;
    }
    // chopped has up to 8 digits,
    // If we use old n we have up to 7 digits
    // 5 more to go...

    // Chop 2 digits
    chopped = n / 100ull;
    if (chopped >= 100)
    {
        n = chopped;
    }
    // chopped has up to 6 digits,
    // If we use old n we have up to 5 digits
    // 3 more to go...

    // Chop 2 digits again
    chopped = n / 100ull;
    if (chopped >= 100)
    {
        n = chopped;
    }
    // chopped has up to 4 digits,
    // If we use old n we have up to 3 digits
    // 1 more to go...

    // Chop last digit if required.
    if (n >= 1000)
    {
        n /= 10;
    }

    return n;
}

For 64 bit values, maximum number is 18446744073709551615, i.e. 20 digits. We have to remove at most 17 digits. As this is not a perfect number to use powers of 2 for number of digits to chop, we repeat the step to chop 2 digits.

That solution might be a bit faster but is likely to take more code.

Gerhardh
  • 11,688
  • 4
  • 17
  • 39
3

Modern processors are able to compute the log2 of an integer in only few cycles using specific low-level instructions (eg. bsr on mainstream x86-64 processors). Based on this great previous post one can compute the log10 of an integer very quickly. The idea is to use a lookup table so to do the translation between the log2 and log10. Once the log10 has been computed, one can just use another lookup table to perform the division by 10 ** log10(number). However, non-constant 64-bit divisions are very expensive on almost all processors. An alternative solution is to use a switch with all the possible case so the compiler can generate an efficient code for all cases and use a fast jump table. Indeed, a division by a constant can be optimized by the compiler in few fast instructions (ie. multiplications and shifts) that are much faster than a non-constant division. The resulting code is not very beautiful/simple though. Here it is:

#include <math.h>
#include <assert.h>

static inline unsigned int baseTwoDigits(unsigned long long x) {
    return x ? 64 - __builtin_clzll(x) : 0;
}

static inline unsigned int baseTenDigits(unsigned long long x) {
    static const unsigned char guess[65] = {
        0,  0,  0,  0,  1,  1,  1,  2,  2,  2,  3,  3,  3,  3,  4,  4,  4,
        5,  5,  5,  6,  6,  6,  6,  7,  7,  7,  8,  8,  8,  9,  9,  9,  9,
       10, 10, 10, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15,
       15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19
    };
    static const unsigned long long tenToThe[] = {
        1ull, 10ull, 100ull, 1000ull, 10000ull, 100000ull, 1000000ull, 10000000ull, 100000000ull,
        1000000000ull, 10000000000ull, 100000000000ull, 1000000000000ull,
        10000000000000ull, 100000000000000ull, 1000000000000000ull,
        10000000000000000ull, 100000000000000000ull, 1000000000000000000ull,
        10000000000000000000ull
    };
    unsigned int digits = guess[baseTwoDigits(x)];
    return digits + (x >= tenToThe[digits]);
}

inline int optimized(unsigned long long number)
{
    const unsigned int intLog = baseTenDigits(number);
    switch(intLog)
    {
        case  0: return number;
        case  1: return number;
        case  2: return number;
        case  3: return number;
        case  4: return number / 10ull;
        case  5: return number / 100ull;
        case  6: return number / 1000ull;
        case  7: return number / 10000ull;
        case  8: return number / 100000ull;
        case  9: return number / 1000000ull;
        case 10: return number / 10000000ull;
        case 11: return number / 100000000ull;
        case 12: return number / 1000000000ull;
        case 13: return number / 10000000000ull;
        case 14: return number / 100000000000ull;
        case 15: return number / 1000000000000ull;
        case 16: return number / 10000000000000ull;
        case 17: return number / 100000000000000ull;
        case 18: return number / 1000000000000000ull;
        case 19: return number / 10000000000000000ull;
        case 20: return number / 100000000000000000ull;
        default: assert(0); return 0;
    }
}

Note that this code use the non-standard compiler built-in __builtin_clzll available on both Clang and GCC. For MSVC, please read this post.

[Update] The previous benchmark did not inline the code of the proposed function as opposed to the others resulting in a slower execution (especially on Clang). Using static+inline helped the compiler to properly inline the function calls.

Results

Here is the results of the methods on QuickBench respectively on GCC and Clang (with -O3). Note that the input distribution is chosen so that the logarithm are quite uniform and pseudo-random (ie. logarithm uniform). This choice as been made since the interviewer said a binary-search was a good solution and this distribution is a best one for such algorithm.

GCC benchmark

Clang benchmark

One can see that this solution is the fastest. The one of Yakov Khodorkovski and qqNade give wrong results for big values due to floating point rounding. The one of qqNade is not presented in the benchmark for sake of clarity as it is >10 time slower than the original one.

The reason why the solution of Gerhardh is so fast with Clang is that the compiler is able to partially generate fast conditional moves instead of slow conditional branches. This optimization is insanely clever since this is only possible on 32-bit integers (and also if the optimization about the division by a constant is performed before), but Clang is able to know that n is small enough after the 2 first conditions! That being said, this optimization is fragile since few changes in the code often appears to break it.

One can note that the original code is surprisingly quite fast (especially on GCC). This is due to branch prediction. Modern processors execute many iterations without checking they should be executed (and roll back if needed). Each iteration is very fast since the division by a constant is optimized: it only takes 2 cycle/iterations on my machine. On modern x86-64 Intel processors a branch misprediction takes 14 cycles while a well-predicted branch takes only 1-2 cycles (similar on AMD Zen processors). The average number of iteration is ~9 and only the last iteration is expensive. The solution of Gerhardh results in far less instructions executed, but it can result in up to 4 miss-predicted branch with GCC and up to 2 with Clang. The proposed solution results in only 1 indirect miss-predicted branch (that processors can less easily execute efficiently though). Since the optimized implementations runs in only ~10 cycles in average on Quickbench, the effect of branch misprediction is huge.

Note that using other input distribution have an impact on results though the overall trend remains the same. Here are results for a uniform distribution: GCC and Clang. The original algorithm is significantly slower since the average number of digits is twice bigger (17~18 instead of ~9) so is the number of iterations. The speed of the other algorithms is not very different compared to the previous distribution and the trend is left unchanged overall for them.

Conclusion

To conclude, the solution of Gerhardh is relatively portable, simple and pretty fast. The new proposed solution is more complex, but it is the fastest on both GCC and Clang. Thus, the solution of Gerhardh should be preferred unless the performance of this function is very important.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • 2
    Great comparison! +1 – Gerhardh May 01 '22 at 20:17
  • 1
    (My guess is `3 * baseTwoDigits(x) / 10`.) – greybeard May 01 '22 at 21:31
  • 1
    @greybeard Indeed, it is not used anymore. I removed it. Thank you. – Jérôme Richard May 01 '22 at 21:46
  • 1
    What inputs did you use for the benchmark? – rici May 01 '22 at 22:14
  • 1
    @rici The ones in the `values` array of the provided QuickBench benchmark in the answer (eg. [here](https://www.quick-bench.com/q/IGwHO4dldKqX74hT37b6Mk936mc)). The values are chosen so the logarithm is quite uniformy random (not truly in practice since only two bits are set but I guess it should be sufficient for the measure). – Jérôme Richard May 01 '22 at 22:23
  • 1
    And what evidence do you have that the actual distribution of inputs will be logarithmically uniform, as opposed to uniform? – rici May 01 '22 at 22:28
  • 1
    ("Obviously", this is a question OP should have asked the interviewer. But even still, you can't arbitrarily choose an input distribution which disproportionately benefits a particular implementation.) – rici May 01 '22 at 22:30
  • 1
    @rici You can check yourself and compute the log10 of the sorted value and then truncate it to integers. The result is: `{0,1,1,2,2,2,2,3,3,3,3,4,6,6,6,6,6,6,6,7,7,7,7,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,16,16,16,16,16,16,16,16,16,16,17,17,17,17,17,18,18,18,18,18,18,18,18}`. This is not perfect as very small numbers are less present but certainly enough for the test. – Jérôme Richard May 01 '22 at 22:39
  • 1
    On the contrary, small numbers are *too* present. If it were a uniform distribution over the range of `unsigned long long`, (almost) all the values would have log10 of 17 or 18. I'm writing up an answer with this fact. – rici May 01 '22 at 22:41
  • 1
    @rici As said before, it is not a uniform distribution. Only the *logarithm* was meant to be uniform. As for the choice, the uniform distribution is biased as the logarithmically uniform one. With the uniform distribution values will generally have the same exponent (close to 17 or 18 as you said) and there is no reason to believe such value will be use in a realistic program. In fact such distribution is very unusual for integer in programs. Small values are often much more frequent, especially values in the range 0-255. You can scan your own RAM to observe that. – Jérôme Richard May 01 '22 at 22:48
  • 2
    Completely depends on the problem domain, which is why you have to ask the interviewer (or whoever assigns you the task). – rici May 01 '22 at 22:51
  • 1
    @rici Theoretically, yeah, I agree. But in practice The OP did not mention it and I doubt it know that. The interview is now over. That being said, if the interviewer ask for the use of a binary search, then it means the log uniform distribution is more probable as it would be a bad idea (ie. inefficient) to use that for a uniform distribution or for small numbers. The OP code would be good for small number and a pseudo-reversed loop would be good for uniform numbers. The interviewer explicitly said that "*this solution wasn't good*" and "*the solution resembles a binary search*". – Jérôme Richard May 01 '22 at 22:58
  • 1
    You're probably right that the OP doesn't know, because they probably didn't ask, which they should have done. A good interviewer will welcome relevant questions, and this question is relevant. I agree that the comment about binary search is evidence that the interviewer had some other distribution floating around in their mind, but they could be mistaken. Maybe they also never really thought through the problem. It happens. It's happened to me when I did interviews. Clarifying preconditions is more useful than speculative optimization and an interviewer should recognize and reward that. IMHO. – rici May 01 '22 at 23:07
  • 1
    @rici I fully agree. I added results for the uniform distribution and made this choice more explicit. I found that the proposed function was not inlined so I fixed that and this resulted in a slightly faster implementation. Now the proposed implementation is the fastest. The results for the uniform distribution are not very different from the previous one except for the original algorithm (which is expected). Thank you. – Jérôme Richard May 02 '22 at 00:09
  • 1
    Thanks. As expected, my solution is much faster on a uniforrm sample. (I'm not saying that's the correct distribution to optimise for. It's just an example of how much difference that makes.) https://www.quick-bench.com/q/Lr16ffRPRHIKRhpW7IZFVsnAVb4 on uniform distribution; https://www.quick-bench.com/q/VDmFSCFRByUrKFnSccRjgHZKfOk on logarithmically uniform. – rici May 02 '22 at 00:26
3

When the interviewer says:

the solution resembles a binary search

that's evidence that the interviewer has in mind a particular distribution of inputs for this function, and that distribution is not a uniform distribution over the range of unsigned long long values. At that point, it's necessary to ask the interviewer what input distribution they expect, since it's not possible to optimise algorithms like this without knowing that.

In particular, if the inputs were selected from a uniform sample of the range of 64-bit unsigned values, then the following simple function would be close to optimal:

/* See Note 1 for this constant */
#define MAX_POWER (10ULL * 1000 * 1000 * 1000 * 1000 * 1000)

int first_3_digits(unsigned long long n) {
  if (n >= 1000) { /* Almost always true but needed for correctness */
    while (n < MAX_POWER * 100) n *= 10;
    if (n >= MAX_POWER * 1000) n /= 10;
    n /= MAX_POWER;
  }
  return n;
}

I hope this demonstrates how much difference the expected input distribution makes. The above solution optimises for the case where the input is close to the maximum number of digits, which is almost always the case with a uniform distribution. In that case, the while loop will hardly ever execute (more precisely, it will execute about 1/1844 times).

That solution also has the advantage that the while loop, even if it does execute, only does multiplications by 10, not divisions by 10. Both GCC and Clang understand how to optimise divisions by constants into multiplication and shift, but a multiplication is still faster, and multiplication by 10 is particularly easy to optimise. [Note 2]


Notes

  1. Note that the constant MAX_POWER was precomputed for the case where unsigned long long is a 64-bit value. That's not guaranteed, although it's common, and it's also the minimum possible size of an unsigned long long. Computing the correct value with the C preprocessor is possible, at least up to a certain point, but it's tedious, so I left it out. The value needed is the largest power of 10 no greater than ULLONG_MAX / 1000. (Since unsigned long long is always a power of 2, it cannot be a power of 10 and the test could equally be for the largest power of 10 less than ULLONG_MAX / 1000, if that were more convenient.)

  2. In the benchmark provided by Jérôme Richard, whose sample inputs are chosen so that their logarithms are roughly uniform --a very different input distribution--, this comes out a bit slower than his optimised solution, although it's within the margin of error of the benchmark tool. (On the other hand, the code is quite a bit simpler.) On Jérôme's second benchmark, with a uniform sample, it comes out a lot faster.

rici
  • 234,347
  • 28
  • 237
  • 341
  • 1
    Minor: one less compare with larges values: `if (n >= MAX_POWER * 1000) n /= 10; else while (n < MAX_POWER * 100) n *= 10;`. – chux - Reinstate Monica May 02 '22 at 15:08
  • 2
    @chux: i was going to do something like that until I noticed that the compiler also thought of that optimisation. :-) I might make a couple of tweaks, though. – rici May 02 '22 at 19:26
2
  • The hint for a binary search implies the last few tests and divides should be as below, using powers of 10: ..., 8, 4, 2, 1

    // Pseudo code
    if (big enough)
      divide by 100,000,000
    if (big enough)
      divide by 10,000 
    if (big enough)
      divide by 100 
    if (big enough)
      divide by 10
    
  • Working this backwards we need up to 5 divisions for a 64-bit unsigned long long.

  • As unsigned long long may be wider than 64, add tests for that.

  • Provide not only a solution, but a test harness to demo the correctness.

Example:

#include <errno.h>
#include <inttypes.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

unsigned first3(unsigned long long x) {
  static const unsigned long long ten18 = 1000000ull * 1000000 * 1000000;
  static const unsigned long long ten16 = 10000ull * 1000000 * 1000000;
  static const unsigned long long ten10 = 10000ull * 1000000;
  static const uint32_t ten8 = 100ul * 1000000;
  static const uint32_t ten6 = 1000000u;
  static const uint32_t ten4 = 10000u;
  static const uint32_t ten3 = 1000u;
  static const uint32_t ten2 = 100u;

  // while loop used in case unsigned long long is more than 64-bit
  // We could use macro-magic to make this an `if` for common 64-bit.
  while (x >= ten18) {
    x /= ten16;
  }

  if (x >= ten10) {
    x /= ten8;
  }
  if (x >= ten6) {
    x /= ten4;
  }
  uint32_t x32 = (uint32_t) x;  // Let us switch to narrow math
  if (x32 >= ten4) {
    x32 /= ten2;
  }
  if (x32 >= ten3) {
    x32 /= 10;
  }
  return (unsigned) x32;
}

Test code

int test_first3(void) {
  char buf[sizeof(unsigned long long) * CHAR_BIT];
  for (size_t i = 1;; i++) {
    for (int dig = '0'; dig <= '9'; dig += 9) {
      memset(buf, dig, i);
      if (dig == '0') {
        buf[0]++;
      }
      buf[i] = '\0';
      errno = 0;
      unsigned long long x = strtoull(buf, 0, 10);
      if (errno) {
        puts("Success!");
        return 0;
      }
      unsigned f3 = first3(x);
      char buf3[sizeof(unsigned) * CHAR_BIT];
      int len = sprintf(buf3, "%u", f3);
      printf("%2zu <%s> <%s>\n", i, buf3, buf);
      if (len > 3) {
        printf("i:%zu dig:%c\n", i, dig);
        return -1;
      }
      if (strncmp(buf, buf3, 3) != 0) {
        return -1;
      }
    }
  }
}

int main(void) {
  test_first3();
}

Output

 1 <1> <1>
 1 <9> <9>
 2 <10> <10>
 2 <99> <99>
 3 <100> <100>
 3 <999> <999>
 4 <100> <1000>
 4 <999> <9999>
 5 <100> <10000>
 5 <999> <99999>
 ...
17 <100> <10000000000000000>
17 <999> <99999999999999999>
18 <100> <100000000000000000>
18 <999> <999999999999999999>
19 <100> <1000000000000000000>
19 <999> <9999999999999999999>
20 <100> <10000000000000000000>
Success!
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

here a short solution using log10 function :

int first_n_digits(unsigned long long number){
    return number < 1000 ? number =  (int)number : number = (int)(number/pow(10,(int)(log10(number)+1)-3));
}
Y103
  • 9
  • 1
-1

You can calculate the length of a number (its power) using the decimal logarithm, then use the decimal exponent to get a divisor less than three orders of magnitude and integer divide the number by it:

#include <assert.h>
#include <math.h>

int first_3_digits(unsigned long long number)
{
     if(number < 1000)
         return number;
     int number_length = int(floor(log10l(number)))+1;
     assert(number_length > 3);
     unsigned long long divider = exp10l(number_length - 3);
     return number/divider;
}

int main()
{
     assert(first_3_digits(0)==0);
     assert(first_3_digits(999)==999);
     assert(first_3_digits(1234)==123);
     assert(first_3_digits(9876543210123456789ull)==987);
     return 0;
}
qqNade
  • 1,942
  • 8
  • 10
  • 1
    I was thinking about upvoting it, but then I realized you use floating-point numbers. For this is a no-flyer... – Lindydancer May 01 '22 at 15:11
  • "#include " was added by me to print the results during debugging, that code was later removed but the line remained and it does not change the behavior of the program in any way. But technically you're right, so you can safely remove the third line "#include " and I've just done it now. – qqNade May 01 '22 at 15:36
  • 2
    It does not work for large numbers like 999999999999999999ull (64-bit) due to the FP rounding. – Jérôme Richard May 01 '22 at 15:43