-1

Here is what I came up with.

  • len is guaranteed to have meaningful value (positive and true size of the char array)
  • s is long unsigned number as a string without null-termination (received from 3rd party lib) typically 11-12 symbols e.g. "123456789000"
  • running on x86 linux

I am not C++ dev, could you help make it faster?

    inline uint64_t strtol(char* s, int len)
     {
         uint64_t val = 0;
         for (int i = 0; i < len; i++)
         {
             char c = *(s + i) - '0';
             val = val * 10 + c;
         }
         return val;
     };
Boppity Bop
  • 9,613
  • 13
  • 72
  • 151
  • 3
    whats wrong with `std::strtol` ? – 463035818_is_not_an_ai Mar 31 '22 at 13:19
  • 7
    This is about as fast as you can get. All of the standard functions required a null terminated string or a `std::string` and constructing those takes about much time as just running this function. Are you sure this is the functions that needs to be improved? Did you use a profiler to determine this function is slowing your code down? – NathanOliver Mar 31 '22 at 13:19
  • 1
    @463035818_is_not_a_number That requires a null terminated string, which the OP does not have. – NathanOliver Mar 31 '22 at 13:21
  • This isn't all that related to C++. At this point you're looking at assembly. BTW it should be `const char*`. – Passer By Mar 31 '22 at 13:21
  • I doubt this function can be the bottleneck. Reading the memory from the `char*` most likely is. – Jeffrey Mar 31 '22 at 13:21
  • Note that the `inline` probably doesn't do anything here. The compiler will inline the functions it wants to inline and for most compilers the keyword's presence or absence is ignored. I'd also make `s` a `const char* s`, it just makes the function easier to use and implicitly clarifies the usage. – François Andrieux Mar 31 '22 at 13:24
  • 5
    if the missing nullterminator is the issue then maybe [`std::from_chars`](https://en.cppreference.com/w/cpp/utility/from_chars) – 463035818_is_not_an_ai Mar 31 '22 at 13:24
  • @FrançoisAndrieux `inline` absolutely does do something, it just doesn't control inlining optimisations. If a function is defined in a header file `inline` is [required to avoid multiple definition errors](https://en.cppreference.com/w/cpp/language/inline) – Alan Birtles Mar 31 '22 at 13:26
  • can you clarify why you want it faster? How fast is it now? How fast do you need it? If you are merely looking for fast, because fast is good then use something from the standard library. Usually it is difficult to beat it in performance by rolling your own – 463035818_is_not_an_ai Mar 31 '22 at 13:27
  • what I mean: You should clarify the quesiton. Are you looking for a way to get the integer from the string or do you need specifically this code to run faster? If it is the latter then you could explain more – 463035818_is_not_an_ai Mar 31 '22 at 13:28
  • @OP Please realize that the code you write is only a description of what you want to occur. The compiler's optimizer takes over and will more than likely rearrange your code so that it looks very little like the code you posted. Thus you need to look at the generated assembly language after optimizations are done, and make conclusions from that code. – PaulMcKenzie Mar 31 '22 at 13:31
  • @AlanBirtles Maybe I should have specified that it doesn't do anything regarding the stated goal "make it faster". But I thought it was implied given the context. – François Andrieux Mar 31 '22 at 13:34
  • 1
    SIMD and `std::from_chars` would almost be the fastest way: [Most insanely fastest way to convert 9 char digits into an int or unsigned int](https://stackoverflow.com/q/70420948/995714), [How to implement atoi using SIMD?](https://stackoverflow.com/q/35127060/995714) – phuclv Mar 31 '22 at 14:27

1 Answers1

1

You might want to have a look at loop unrolling. When the body of a loop is short enough, checking the loop condition every iteration might be relatively expensive.

A specific and interesting way of implementing loop unrolling is called Duff's device: https://en.wikipedia.org/wiki/Duff%27s_device

Here's the version for your function:

inline uint64_t strtol_duff(char* s, int len)
{
    uint64_t val = 0;
    int n = (len + 7) / 8;
    int i = 0;
    switch (len % 8) {
    case 0: do {
                 val = val * 10 + (*(s + i++) - '0');
    case 7:      val = val * 10 + (*(s + i++) - '0');
    case 6:      val = val * 10 + (*(s + i++) - '0');
    case 5:      val = val * 10 + (*(s + i++) - '0');
    case 4:      val = val * 10 + (*(s + i++) - '0');
    case 3:      val = val * 10 + (*(s + i++) - '0');
    case 2:      val = val * 10 + (*(s + i++) - '0');
    case 1:      val = val * 10 + (*(s + i++) - '0');
    } while (--n > 0);
    }
    return val;
};

To be honest, in your case I believe you will not see a huge benefit because the loop's body is not that tiny. It's all very much system dependent and requires experimentation (like most optimizations). Good compiler optimizers might unroll the loop automatically if it is actually beneficial.

But it's worth to try.

wohlstad
  • 12,661
  • 10
  • 26
  • 39