3

[EDIT] I am NOT accepting any answer which involves BigInteger, or other similarly inefficient method. Please actually read the question before answering!

Java, annoyingly enough, does not support unsigned number types. You can convert a byte, short or int to unsigned, by using the next bigger type, for example:

short s = -10;
int unsigned_short = s & 0xFFFF;

But you cannot do this with long, since there is no bigger type.

So, how do you convert a signed long into an "unsigned" base-X, in my case base-36, and back? The Long class has those methods, but treat longs as signed, simply because they are.

I could probably do that using some manipulation and BigInteger, but BigInteger is incredibly slow, and creates garbage through temporary BigInteger creation. And I'm going to be doing a lot of those conversions (I think). I need an algorithm that is as efficient as the default implementation of Long.toString(long i, int radix).

Trying to adapt the code of Long.toString() I come to:

final int RADIX = 36;
final char[] DIGITS = { '0', ... , 'Z' };
long value = 100;
if (value == 0) {
    return "0";
} else {
    char[] buf = new char[13];
    int charPos = 12;
    long i = value;
    while (i != 0) {
        buf[charPos--] = DIGITS[Math.abs((int) (i % RADIX))];
        i /= RADIX;
    }
    return new String(buf, charPos + 1, (12 - charPos));
}

But it does not handle negative values correctly, despite the Math.abs().

Once this works, I need the reverse conversion, but I'm hoping it will be easier. Your welcome to put it in your answer too.

[EDIT] Actually, I just looked at the code for Long.parseLong(String s, int radix), and it looks more complicated than Long.toString(long i, int radix).

Eugene Retunsky
  • 13,009
  • 4
  • 52
  • 55
Sebastien Diot
  • 7,183
  • 6
  • 43
  • 85

5 Answers5

8
    long l = 0xffffffffffffffffL; // any long, e.g. -1

    // to string
    BigInteger bi = new BigInteger(Long.toString(l & ~(1L << 63)));
    if (l < 0) bi = bi.setBit(64);
    final String b36 = bi.toString(36);
    System.out.println("original long:" + l);
    System.out.println("result 36: " + b36);

    // parse
    final BigInteger parsedBi = new BigInteger(b36, 36);

    l = parsedBi.longValue();
    if (parsedBi.testBit(64)) l = l | (1L << 63);
    System.out.println("parsed long = " + l);

Benchmarking (one million operations):

    // toString
    long l = 0x0ffffffffffffeffL;
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) toStringBi(l);
        System.out.println("BigInteger time = " + 
            (System.currentTimeMillis() - start) + " ms.");
    }
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) Long.toString(l, 36);
        System.out.println("Long.toString time = " + 
           (System.currentTimeMillis() - start) + "ms.");
    }
    // Parsing
    final String b36 = toStringBi(l);
    final String long36 = Long.toString(l, 36);
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            final BigInteger parsedBi = new BigInteger(b36, 36);
            l = parsedBi.longValue();
            if (parsedBi.testBit(64)) l = l | (1L << 63);
        }
        System.out.println("BigInteger.parse time = " 
            + (System.currentTimeMillis() - start) + " ms.");
    }
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) Long.parseLong(long36, 36);
        System.out.println("Long.parseLong time = " 
            + (System.currentTimeMillis() - start) + "ms.");
    }
  • BigInteger time = 1027 ms.
  • Long.toString time = 244ms.
  • BigInteger.parse time = 297 ms.
  • Long.parseLong time = 132ms.
Eugene Retunsky
  • 13,009
  • 4
  • 52
  • 55
  • OK. You get +1 because I can test another implementation using this, but I will not accept it, as I have *specifically* said that the solution should not use BigInteger. – Sebastien Diot Mar 28 '12 at 19:29
  • 3
    I just made a benchmark - it is slower, but it performs a million conversions in about a second (only four times slower than Long.toString). If you'd like to kill time for micro-optimizations... Well, I don't think it's worth doing. BigInteger is fine in this case. – Eugene Retunsky Mar 28 '12 at 19:40
  • +1 Those results are surprising, I must say. The "incredibly slow" comment I made referred to some other use of BigInteger I benchmarked before. I wanted to put a bounty on this Q, but it's not possible anymore if there is already one answer. I will just have to work it out myself. :/ – Sebastien Diot Mar 29 '12 at 11:03
  • I don't think StackOverflow works this way. It's a Q/A service, not outsourcing company. You get education here, not someone works for you (even for bounty). – Eugene Retunsky Mar 29 '12 at 16:46
  • Most of my values are probably going to be >= 0. In that case, I can actually speed things up: if (value >= 0) return Long.toString(value, RADIX); else . Like this most values are converted quickly. Same thing applies in reverse based on the String length. – Sebastien Diot Mar 29 '12 at 17:48
  • 1
    My experience differs. I had a question unanswered for at least a month about Scala. Then some *third party* put a bounty on my question (I didn't know this was possible), and in two days or so it was answered. People here DO take bounty as "payment" for hard questions. – Sebastien Diot Mar 29 '12 at 17:51
  • Bounty just helps the question get attention. The one true bounty people here accept as payment is the fulfillment one gets from solving a puzzle. Remember, there are people paying for little books with crossword puzzles in them. Stackoverflow just provides the challenges for free :) – Stijn de Witt Sep 17 '13 at 20:36
2

Another option is to use UnsignedLongs from the Google guava-libraries (which have lots of other goodies as well):

String s = UnsignedLongs.toString( -1L, Character.MAX_RADIX );

and

long l = UnsignedLongs.parseUnsignedLong( "2jsu3j", 36 );

Added to the benchmark from +EugeneRetunsky (see below) this gives the following times on my machine:

  • BigInteger time (1st run) = 1306 ms.
  • BigInteger time (2nd run) = 1075 ms.
  • Long.toString time = 422ms.
  • UnsignedLongs.toString time = 445ms.
  • BigInteger.parse time = 298 ms.
  • Long.parseLong time = 164ms.
  • UnsignedLongs.parseUnsignedLong time = 107ms.

Out of curiosity, I let the first test run twice to check if that would improves the time. It consistently does (to ~400ms on my machine), also for the case of UnsignedLongs. The other options do not seem to profit any more from the hot-spot compiler.

public class UnsignedLongsTest {
private static String toStringBi( long l ) {
    BigInteger bi = new BigInteger(Long.toString(l & ~(1L << 63)));
    if (l < 0) {
        bi = bi.setBit(64);
    }
    final String b36 = bi.toString(36);
    return b36;
}

public static void main( String[] args ) {
    // toString
    long l = 0x0ffffffffffffeffL;
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            toStringBi(l);
        }
        System.out.println("BigInteger time (1st run) = " +
                (System.currentTimeMillis() - start) + " ms.");
    }
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            toStringBi(l);
        }
        System.out.println("BigInteger time (2nd run) = " +
                (System.currentTimeMillis() - start) + " ms.");
    }
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            Long.toString(l, 36);
        }
        System.out.println("Long.toString time = " +
           (System.currentTimeMillis() - start) + "ms.");
    }
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            UnsignedLongs.toString(l, 36);
        }
        System.out.println("UnsignedLongs.toString time = " +
                (System.currentTimeMillis() - start) + "ms.");
    }
    // Parsing
    final String b36 = toStringBi(l);
    final String long36 = Long.toString(l, 36);
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            final BigInteger parsedBi = new BigInteger(b36, 36);
            l = parsedBi.longValue();
            if (parsedBi.testBit(64)) {
                l = l | (1L << 63);
            }
        }
        System.out.println("BigInteger.parse time = "
            + (System.currentTimeMillis() - start) + " ms.");
    }
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            Long.parseLong(long36, 36);
        }
        System.out.println("Long.parseLong time = "
            + (System.currentTimeMillis() - start) + "ms.");
    }
    {
        final long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            UnsignedLongs.parseUnsignedLong( long36, 36 );
        }
        System.out.println("UnsignedLongs.parseUnsignedLong time = "
                + (System.currentTimeMillis() - start) + "ms.");
    }
}
Ivin
  • 1,081
  • 11
  • 21
1

Since despite "NOT accepting any answer which involves BigInteger", you accepted a BigInteger solution, here is an alternate BigInteger solution. Rather than masking off the sign, you can force the signum to always positive:

long input = 0xffffffffffffffffL; // any long, e.g. -1
byte[] bytes = ByteBuffer.allocate(8).putLong(input).array();

String base36 = new BigInteger(1, bytes).toString(36);
mckamey
  • 17,359
  • 16
  • 83
  • 116
  • My end-solution, as @tc. pointed out, was to first check the sign, and use Long whenever possible. This gave enough speed for my requirements. – Sebastien Diot Jul 26 '12 at 15:13
  • FYI, the masking solution failed a round trip decoding for me when I tried it. I don't recall the value which caused it. – mckamey Jul 28 '12 at 18:31
1

Also, if you are working with the long as a byte array, @JonnyDee has an algorithm (in Python but it's short) for converting between any two bases which is applicable here if you consider the byte array to be a number with Base-256 digits. Converting back to bytes is just converting base-36 to base-256.

https://stackoverflow.com/a/6158278/43217

And his corresponding blog post:

https://jonnydee.wordpress.com/2011/05/01/convert-a-block-of-digits-from-base-x-to-base-y/

Community
  • 1
  • 1
mckamey
  • 17,359
  • 16
  • 83
  • 116
  • This doesn't look fast, but it's a general solution, which is a good thing. I'm actually considering moving to Base-40 anyway, because you can encode with Base-40 the same number of characters (12) in a 64-bits long, than you could with Base-36. – Sebastien Diot Jul 26 '12 at 15:20
  • Interesting. What four characters are you using for beyond 36? Upper case or punctuation? I personally really like the case insensitivity of Base-36. It's good in URLs. – mckamey Jul 28 '12 at 18:32
  • I "don't do URLs" ... I intend to use punctuation, as I want to use it for user names. Adding underline (for space), dash, dot and apostrophe will help make logins more readable. Also, covering a greater range of values helps improve hashing. I want to use longs for *all* my IDs; I have hardcoded long as the table key type in my DB API (table value is byte[]). – Sebastien Diot Jul 29 '12 at 16:22
  • So users of your system will have random user ids like `z-ufh_w'.posg` to correspond with their long key? – mckamey Jul 30 '12 at 17:24
  • No! :D I want to let them choose their login ID, like "whack-a-mole", "jack_o'neill" of "j.smith", and turn that into a long key. – Sebastien Diot Jul 31 '12 at 12:33
  • That might be a limited search space. For instance, "sebastien.diot" wouldn't be allowed because it is too long. – mckamey Jul 31 '12 at 21:24
1

The problem is that you're looking for a fast unsigned 64-bit divmod given only a signed 64-bit divmod. Searching for udivmoddi3 should give you a few implementations in C — these are typically used to do 64-bit divmod on architectures that only support 32-bit divmod in hardware.

Note that you only need to grab the bottom digit — once you've done this, the quotient will be positive and you can use Long.toString().

If the radix is even (you state base 36), you can get the bottom digit without too much hassle (my math may be wrong):

int bottomDigit = ((value>>>1)%(radix/2))<<1)|((int)value&1);
long rest = (value>>>1)/(radix/2);
if (rest == 0)
{
  return Integer.toString(bottomDigit,radix);
}
return Long.toString(rest,radix) + Integer.toString(bottomDigit,radix);

An obvious further optimization is to call Long.toString() directly if the value is positive.

tc.
  • 33,468
  • 5
  • 78
  • 96