8

The following code defines a sequence of names that are mapped to numbers. It is designed to take a number and retrieve a specific name. The class operates by ensuring the name exists in its cache, and then returns the name by indexing into its cache. The question in this: how can the name be calculated based on the number without storing a cache?

The name can be thought of as a base 63 number, except for the first digit which is always in base 53.

class NumberToName:

    def __generate_name():
        def generate_tail(length):
            if length > 0:
                for char in NumberToName.CHARS:
                    for extension in generate_tail(length - 1):
                        yield char + extension
            else:
                yield ''
        for length in itertools.count():
            for char in NumberToName.FIRST:
                for extension in generate_tail(length):
                    yield char + extension

    FIRST = ''.join(sorted(string.ascii_letters + '_'))
    CHARS = ''.join(sorted(string.digits + FIRST))
    CACHE = []
    NAMES = __generate_name()

    @classmethod
    def convert(cls, number):
        for _ in range(number - len(cls.CACHE) + 1):
            cls.CACHE.append(next(cls.NAMES))
        return cls.CACHE[number]

    def __init__(self, *args, **kwargs):
        raise NotImplementedError()

The following interactive sessions show some of the values that are expected to be returned in order.

>>> NumberToName.convert(0)
'A'
>>> NumberToName.convert(26)
'_'
>>> NumberToName.convert(52)
'z'
>>> NumberToName.convert(53)
'A0'
>>> NumberToName.convert(1692)
'_1'
>>> NumberToName.convert(23893)
'FAQ'

Unfortunately, these numbers need to be mapped to these exact names (to allow a reverse conversion).


Please note: A variable number of bits are received and converted unambiguously into a number. This number should be converted unambiguously to a name in the Python identifier namespace. Eventually, valid Python names will be converted to numbers, and these numbers will be converted to a variable number of bits.


Final solution:

import string

HEAD_CHAR = ''.join(sorted(string.ascii_letters + '_'))
TAIL_CHAR = ''.join(sorted(string.digits + HEAD_CHAR))
HEAD_BASE, TAIL_BASE = len(HEAD_CHAR), len(TAIL_CHAR)

def convert_number_to_name(number):
    if number < HEAD_BASE: return HEAD_CHAR[number]
    q, r = divmod(number - HEAD_BASE, TAIL_BASE)
    return convert_number_to_name(q) + TAIL_CHAR[r]
Noctis Skytower
  • 21,433
  • 16
  • 79
  • 117
  • Why this special requirement ? Could you please elaborate the purpose of no cache? – dan-boa Jun 15 '12 at 14:50
  • The cache consumes a lot of memory that really shouldn't be needed. – recursive Jun 15 '12 at 14:52
  • 2
    A variable number of bits are received and converted unambiguously into a number. This number should be converted unambiguously to a name in the Python identifier namespace. Eventually, valid Python names will be converted to numbers, and these numbers will be converted to a variable number of bits. – Noctis Skytower Jun 15 '12 at 14:56
  • @recursive: That is the reason for the question. "How can the name be calculated based on the number without storing a cache?" – Noctis Skytower Jun 15 '12 at 14:57
  • @NoctisSkytower: Right, I was answering dan-boa's question. – recursive Jun 15 '12 at 15:00
  • @NoctisSkytower did you used to write java? – Jakob Bowyer Jun 15 '12 at 15:05
  • @JakobBowyer: For a little more than a semester at college. The reason the code is using a class as a namespace is because the ideas are still in their infancy, and the module that contains the code has a bunch of junk (experimental code) in it. Instead of creating a whole new module at this point, everything is being kept together and should be refactored later. Using the class as a namespace was just a very explicit way of grouping this section of code together in the module as it is being developed. – Noctis Skytower Jun 15 '12 at 15:12
  • Could you use binascii.hexlify()/unhexlify() to convert bits to hexstrings and back for your case? – jfs Jun 15 '12 at 15:14
  • Instead of thinking of the name as a mixed-base number, could the input be changed to a base 63 number possibly with a leading underscore if the first character is a digit (which gets ignored when converting it to a number)? – martineau Jun 15 '12 at 15:35
  • 1. Is it correct that your input at some point is a bytestring? 2. '_'+any_hexstring is a valid Python identifier unless there is a length restriction. Are there any other constraints? – jfs Jun 15 '12 at 17:02
  • @J.F.Sebastian: Those functions do not map a variable number of bits to the Python identifier namespace and vise versa. So no, hexlify and unhexlify do not meet the requirements. **Reason:** hexlify converts to a multiple of four bits, not one bit. Unhexlify can generate unacceptable names. – Noctis Skytower Jun 15 '12 at 18:34
  • To clarify: `name = '_'+hexlify(bits)` and in reverse `bits = unhexlify(name[1:])` (note: no intermidiate step with numbers). If your input is not a bytestring then provide an example – jfs Jun 15 '12 at 19:32
  • An example can be found here: http://ideone.com/UO7s5 – Noctis Skytower Jun 15 '12 at 19:49
  • @Noctis Skytower: Despite all the additional information you've added, I don't think you've ever explained the need to generate legal Python identifier names... – martineau Jun 15 '12 at 22:42
  • @martineau: Is an explanation needed for the question to be answered? If you want to understand, you can create a new question "Why convert variable length bit strings into valid Python identifiers?" There would be more incentive to teach in that context. – Noctis Skytower Jun 18 '12 at 13:50
  • @Noctis Skytower: I wasn't asking for a lesson. Just curious as to how you were planning on using something like this (and because often the best answer a question is "Here's a better or easier way to reach your goal.") – martineau Jun 18 '12 at 15:37

4 Answers4

7

This is a fun little problem full of off by 1 errors.

Without loops:

import string

first_digits = sorted(string.ascii_letters + '_')
rest_digits = sorted(string.digits + string.ascii_letters + '_')

def convert(number):
    if number < len(first_digits):
        return first_digits[number]

    current_base = len(rest_digits)
    remain = number - len(first_digits)
    return convert(remain / current_base) + rest_digits[remain % current_base]

And the tests:

print convert(0)
print convert(26)
print convert(52)
print convert(53)
print convert(1692)
print convert(23893)

Output:

A
_
z
A0
_1
FAQ
recursive
  • 83,943
  • 34
  • 151
  • 241
  • Thanks for your assistance! Seeing your `remain` variable helped out a lot. – Noctis Skytower Jun 15 '12 at 19:57
  • 1
    Alternative for the last three lines: `number, remain = divmod(number - len(first_digits), len(rest_digits)); return convert(number) + rest_digits[remain]`. – Sven Marnach Jun 15 '12 at 21:35
  • Using recursion instead of looping is not necessarily faster (not that you said it was). It does reduce the number of lines of code, though. Nice answer! – martineau Jun 15 '12 at 22:48
  • There is a question now of how to accomplish the opposite calculation: http://stackoverflow.com/questions/17045037 – Noctis Skytower Jun 12 '13 at 19:06
3

What you've got is a corrupted form of bijective numeration (the usual example being spreadsheet column names, which are bijective base-26).

One way to generate bijective numeration:

def bijective(n, digits=string.ascii_uppercase):
    result = []
    while n > 0:
        n, mod = divmod(n - 1, len(digits))
        result += digits[mod]
    return ''.join(reversed(result))

All you need to do is supply a different set of digits for the case where 53 >= n > 0. You will also need to increment n by 1, as properly the bijective 0 is the empty string, not "A":

def name(n, first=sorted(string.ascii_letters + '_'), digits=sorted(string.ascii_letters + '_' + string.digits)):
    result = []
    while n >= len(first):
        n, mod = divmod(n - len(first), len(digits))
        result += digits[mod]
    result += first[n]
    return ''.join(reversed(result))
ecatmur
  • 152,476
  • 27
  • 293
  • 366
2

Tested for the first 10,000 names:

first_chars = sorted(string.ascii_letters + '_')
later_chars = sorted(list(string.digits) + first_chars)

def f(n):
    # first, determine length by subtracting the number of items of length l
    # also determines the index into the list of names of length l
    ix = n
    l = 1
    while ix >= 53 * (63 ** (l-1)):
        ix -= 53 * (63 ** (l-1))
        l += 1

    # determine first character
    first = first_chars[ix // (63 ** (l-1))]

    # rest of string is just a base 63 number
    s = ''
    rem = ix % (63 ** (l-1))
    for i in range(l-1):
        s = later_chars[rem % 63] + s
        rem //= 63

    return first+s
Rodrigo Queiro
  • 1,324
  • 8
  • 15
1

You can use the code in this answer to the question "Base 62 conversion in Python" (or perhaps one of the other answers).

Using the referenced code, I think the answer your real question which was "how can the name be calculated based on the number without storing a cache?" would be to make the name the simple base 62 conversion of the number possibly with a leading underscore if the first character of the name is a digit (which is simply ignored when converting the name back into a number).

Here's sample code illustrating what I propose:

from base62 import base62_encode, base62_decode

def NumberToName(num):
    ret = base62_encode(num)
    return ('_' + ret) if ret[0] in '0123456789' else ret

def NameToNumber(name):
    return base62_decode(name if name[0] is not '_' else name[1:])

if __name__ == '__main__':
    def test(num):
        name = NumberToName(num)
        num2 = NameToNumber(name)
        print 'NumberToName({0:5d}) -> {1!r:>6s}, NameToNumber({2!r:>6s}) -> {3:5d}' \
              .format(num, name, name, num2)

    test(26)
    test(52)
    test(53)
    test(1692)
    test(23893)

Output:

NumberToName(   26) ->    'q', NameToNumber(   'q') ->    26
NumberToName(   52) ->    'Q', NameToNumber(   'Q') ->    52
NumberToName(   53) ->    'R', NameToNumber(   'R') ->    53
NumberToName( 1692) ->   'ri', NameToNumber(  'ri') ->  1692
NumberToName(23893) -> '_6dn', NameToNumber('_6dn') -> 23893

If the numbers could be negative, you might have to modify the code from the referenced answer (and there is some discussion there on how to do it).

Community
  • 1
  • 1
martineau
  • 119,623
  • 25
  • 170
  • 301