1

I'm looking to "truncate" 0's from long values.

For example, for long value "1234560000", I'd like to drop the last four 0's (to 123456) and also need to know how many 0's have been dropped.

I can achieve this with a % 10 operations:

void Main()
{
    Console.WriteLine(Truncate(1234560000));
}

public static long Truncate(long mantissa)
{
    int droppedZeros = 0;

    while (mantissa % 10 == 0)
    {
        mantissa /= 10;
        droppedZeros++;
    }

    return mantissa;
}

This piece of code getting is getting called millions of times and is performance critical and I'm looking for ways to improve performance to achieve the same without modulo (can this be done with bit shifts?).

Per request, I added some benchmarks numbers including a benchmark that performs division with compile time known constant to showcase the relative overhead of the modulo operation:

          Method |      Mean |     Error |    StdDev |    Median | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
---------------- |----------:|----------:|----------:|----------:|------------:|------------:|------------:|--------------------:|
  DivideNoModulo |  1.863 ms | 0.0431 ms | 0.1272 ms |  1.855 ms |           - |           - |           - |                   - |
     ModuloBasic | 21.342 ms | 0.8776 ms | 2.5876 ms | 20.813 ms |           - |           - |           - |                   - |
   DivisionBasic | 18.159 ms | 1.7218 ms | 5.0768 ms | 15.937 ms |           - |           - |           - |                   - |
 DivisionSmarter |  7.510 ms | 0.5307 ms | 1.5649 ms |  7.201 ms |           - |           - |           - |                   - |
   ModuloSmarter |  8.517 ms | 0.1673 ms | 0.2886 ms |  8.531 ms |           - |           - |           - |                   - |
  StringTruncate | 42.370 ms | 1.7358 ms | 5.1181 ms | 40.566 ms |   1000.0000 |           - |           - |           8806456 B |

Benchmark code:

 [SimpleJob(RunStrategy.ColdStart, 1)]
    [MemoryDiagnoser]
    public class EncoderBenchmark
    {
        private long _mantissa;

        [Benchmark]
        public void DivideNoModulo()
        {
            for (var i = 0; i < 100000; i++)
            {
                _mantissa = 12345600000000;

                _mantissa /= 100000000;
            }
        }

        [Benchmark]
        public void ModuloBasic()
        {
            for (var i = 0; i < 100000; i++)
            {
                _mantissa = 12345600000000;

                while (_mantissa % 10 == 0)
                {
                    _mantissa /= 10;
                }
            }
        }

        [Benchmark]
        public void DivisionBasic()
        {
            for (var i = 0; i < 100000; i++)
            {
                _mantissa = 12345600000000;

                for (;;)
                {
                    long temp = _mantissa / 10;
                    if (temp * 10 != _mantissa)
                        break;

                    _mantissa = temp;
                }
            }
        }


        [Benchmark]
        public void DivisionSmarter()
        {
            for (var i = 0; i < 100000; i++)
            {
                _mantissa = 12345600000000;


                for (; ; )
                {
                    long temp = _mantissa / 1000000;
                    if (temp * 1000000 != _mantissa)
                        break;

                    _mantissa = temp;
                }

                for (; ; )
                {
                    long temp = _mantissa / 10;
                    if (temp * 10 != _mantissa)
                        break;

                    _mantissa = temp;
                }
            }
        }

        [Benchmark]
        public void ModuloSmarter()
        {
            for (var i = 0; i < 100000; i++)
            {
                _mantissa = 12345600000000;

                while (_mantissa % 1000000 == 0)
                {
                    _mantissa /= 1000000;
                }

                while (_mantissa % 10 == 0)
                {
                    _mantissa /= 10;
                }
            }
        }

        [Benchmark]
        public void StringTruncate()
        {
            for (var i = 0; i < 100000; i++)
            {
                _mantissa = 12345600000000;

                _mantissa = long.Parse(_mantissa.ToString().TrimEnd('0'));
            }
        }
    }
TJF
  • 2,248
  • 4
  • 28
  • 43
  • 1
    Can you show us the benchmarkdotnet results for the options you have tried so far? Include `string.TrimEnd('0')` as one of the options (although I am pretty sure it will be terrible). – mjwills Feb 06 '19 at 03:47
  • Can you give us a bit more context as to **why** you want to change `1234560000` to `123456`, just in case that opens up some other ways of solving the issue? – mjwills Feb 06 '19 at 03:48
  • I don't have another option, so I'm not sure what comparison you are looking for or why it would be relevant. This is in the context of an encoding algorithm, where we are converting a double to a base 10 representation of a mantissa and an exponent and then reducing the power of the exponent to encode the mantissa into a smaller byte array – TJF Feb 06 '19 at 03:49
  • If there really is a performance issue, have you tried batching the data and either running in separate threads (and using SSE, if you're on Intel) or on a GPU? And would this question be better suited to [codereview.se]? – Ken Y-N Feb 06 '19 at 04:01
  • https://stackoverflow.com/questions/11040646/faster-modulus-in-c-c may be of interest (read all of it - it has some stuff specific to powers of 2, and stuff that isn't). – mjwills Feb 06 '19 at 04:39
  • Seeing the benchmarks, two things strike me: first compiler optimisation, and second, if you are outputting the value as a string, there should be a `ToString()` and `StringTruncate()` should lose the `Parse()`. And third, using a constant `_mantissa` with lots of zeros will skew timings towards `ModuloSmarter()`. – Ken Y-N Feb 06 '19 at 04:41
  • I'm not following how I should understand your comment of "compiler optimisation" and I'm not outputting the values as string - the string/truncate implementation was added per the request of @mjwillis – TJF Feb 06 '19 at 04:44
  • Your original code was `Console.WriteLine(Truncate(1234560000));` and your description implies you are writing characters, so perhaps you should benchmark the output formatting too? – Ken Y-N Feb 06 '19 at 04:47
  • The original snippet was for illustrative purposes so someone could run the relevant snippet and see output. It does not capture the business need. I'm simply looking for an arithmetic or bitwise implementation to achieve the same without modulo operation. In regards to your comment of the benchmark skewed towards ModuloSmarter - yes it is, put it covers my actual scenario of the mantissa being truncated having a length of 14 – TJF Feb 06 '19 at 04:56

3 Answers3

3

You'll be very unlikely to get it working efficiently with bit shifting since that's ideal for dividing by powers of two, of which 10 is not one.

A possibility for improvement, where you may often get numbers with lots of trailing zeros, is to use multiple loops to do the work in bigger chunks, such as:

if (mantissa != 0) {
    while (mantissa % 1000000 == 0) {
        mantissa /= 1000000;
        droppedZeros += 6;
    }
    while (mantissa % 1000 == 0) {
        mantissa /= 1000;
        droppedZeros += 3;
    }
    while (mantissa % 10 == 0) {
        mantissa /= 10;
        droppedZeros++;
    }
}

This will usually result in fewer instructions being performed but, as with all optimisations, measure, don't guess! It may be that the added code complexity is not worth the gains you get (if any).


Note that I've also caught the mantissa == 0 case since that will result in an infinite loop in your original code.


Another possibility you may want to consider is if you're doing this operation to the same item multiple times. For example, let's say you have a collection of integers and, each time you need to process one of them, you have to strip off and count the trailing zeros.

In that case, you could actually store them in a different way. For example, consider the (pseudo-code) structure:

struct item:
    int originalItem
    int itemWithoutZeroes
    int zeroCount

Basically, whenever you first receive an item (such as 1234560000), you immediately convert it to the structure once and once only:

{ 1234560000, 123456, 4 }

This provides a cached version of the zero-stripped item so you never have to calculate it again.

So, if you want the stripped mantissa, you just use item.itemWithoutZeros. If you want to output the number in its original form, you can use item.originalItem. And, if you want the count of zeros, use item.zeroCount.

Obviously, this will take up more storage space but you'll often find that optimisation is a time/space trade-off.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • Thank you for the suggestion - this is actually what I'm doing right now. I was hoping to get rid of the modulo operations altogether as in my benchmarks they seem to take up the bulk of the cpu cycles and I was curious if there is a more performant way to achieve the same – TJF Feb 06 '19 at 03:57
  • Thanks for the edit, unfortunately the expected numbers are entirely variable. However, there is some expectations that can be made in the distribution of 0's. 8 0s are the most likely outcome, followed by 6 0s. After that, a fairly even distribution is likely. In regards to truncating in batching, attempting to truncate 6 0s, followed by individual 0s seems to make the most sense to me – TJF Feb 06 '19 at 14:29
1

A bit faster replace '%' with '*'

public static long T(long mantissa)
{
    if (mantissa == 0)
        return 0;

    int droppedZeros = 0;

    for (; ; )
    {
        long temp = mantissa / 1000000;
        if (temp * 1000000 != mantissa)
            break;

        mantissa = temp;
        droppedZeros += 6;
    }

    for (; ; )
    {
        long temp = mantissa / 1000;
        if (temp * 1000 != mantissa)
            break;

        mantissa = temp;
        droppedZeros += 3;
    }

    for (; ; )
    {
        long temp = mantissa / 10;
        if (temp * 10 != mantissa)
            break;

        mantissa = temp;
        droppedZeros++;
    }

    return mantissa;
}
shingo
  • 18,436
  • 5
  • 23
  • 42
  • Thank you, yes that is a touch faster indeed - I've updated the benchmark numbers with 2 implementations for the above that compare to similar modulo operations – TJF Feb 06 '19 at 14:22
0

Update your Truncate logic as below.

public static long Truncate(long mantissa)
{
    if (mantissa == 0)
      return 0;
    var mantissaStr = mantissa.ToString();
    var mantissaTruncated = mantissaStr.TrimEnd('0');
    int droppedZeros = mantissaStr.Length - mantissaTruncated.Length;
    return Convert.ToInt64(mantissaTruncated);
}
Karan
  • 12,059
  • 3
  • 24
  • 40