How to convert a string into a custom base using 2 alphabet chars per letter

Question

The functions below convert number 255 (base10) into 'FF' (base16) using base16 alphabet '0123456789ABCDEF'.

I'm having difficulties figuring out how to modify the functions such that they would use 2 characters of the alphabet per letter so that number 255 (base10) would convert to 'xFxF' (base16) using modified base16 alphabet 'x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF'.

def v2r(num, alphabet):
  """Convert base 10 number into a string of a custom base (alphabet)."""
  alphabet_length = len(alphabet)
  result = ''
  while num > 0:
    result = alphabet[num % alphabet_length] + result
    num  = num // alphabet_length
  return result


def r2v(data, alphabet):
  """Convert string of a custom base (alphabet) back into base 10 number."""
  alphabet_length = len(alphabet)
  num = 0
  for char in data:
    num = alphabet_length * num + alphabet[:alphabet_length].index(char)
  return num

base16 = v2r(255, '0123456789ABCDEF')
base10 = r2v(base16, '0123456789ABCDEF')
print(base16, base10)
# output: FF 255

# base16 = v2r(255, 'x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF')
# base10 = r2v(base16, 'x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF')
# print(base16, base10)
# output: xFxF 255

For such a small program, executing the code myself with pen and paper for a small example often helps me to find the problem quite easily. — MrSmith42, Dec 04 '20 at 12:35
@MrSmith42 yeah it's quite a simple concept but I've never really worked with base conversions, so I don't have a complete understanding of the concept itself — AlekseyHoffman, Dec 04 '20 at 12:51
The only thing you have to modify in `v2r` is the line `result = alphabet[num % alphabet_length] + result`. There a several ways to modify it that would work. Good luck! — Stef, Dec 04 '20 at 12:56
Note that the simplest way both for `v2r` and `r2v` is to first split your string into a sequence of length-2 strings. See this related question: [Split strin every nth character](https://stackoverflow.com/questions/9475241/split-string-every-nth-character) — Stef, Dec 04 '20 at 12:58
Here the basics: base conversion: https://en.wikipedia.org/wiki/Base_conversion — MrSmith42, Dec 04 '20 at 13:03

m.raynal · Accepted Answer · 2020-12-04T13:48:29.530

2

Here is a possible workaround. I think your bug came from a confusion with python definition of types and iterables.
I've modified the base 16 alphabet, it is now a list of items. Then I also modified a bit the function to take this into account, and it looks like it works.

def v2r(num, alphabet):
    """Convert base 10 number into a string of a custom base (alphabet)."""
    alphabet_length = len(alphabet)
    result = []
    while num > 0:
        result = [alphabet[num % alphabet_length]] + result
        num  = num // alphabet_length
    return result


def r2v(data, alphabet):
    """Convert string of a custom base (alphabet) back into base 10 number."""
    alphabet_length = len(alphabet)
    num = 0
    for char in data:
        num = alphabet_length * num + alphabet.index(char)
    return num

alphabet = [
    'x0','x1', 'x2', 'x3', 'x4', 'x5', 'x6', 'x7', 'x8',
    'x9', 'xA', 'xB', 'xC', 'xD', 'xE', 'xF'
]
base16 = v2r(255, alphabet)
base10 = r2v(base16, alphabet)
print(''.join(base16), base10)
#  xFxF 255

Following the OP's comment: just declare the following alphabet:

hexa = '0123456789abcdef'
alphabet = [
    a+b for a in hexa for b in hexa
]

edited Dec 04 '20 at 13:48

answered Dec 04 '20 at 13:01

m.raynal

2,983
2
21
34

1

Did you test it? It looks like `result += [alphabet[num % alphabet_length]]` is appending characters at the end instead of adding them at the beginning; so the output is going to be reversed. – Stef Dec 04 '20 at 13:05
Thanks for the solution. I know why I was so confused. I wanted to make each character to encode 2 bits: `01, 02, 03, ..., FF`, rather than just use a custom char like `xF` I just didn't explain it well at all... – AlekseyHoffman Dec 04 '20 at 13:12
@Stef is there a way to modify it so that each character actually encodes 2 bits like that `01, 02, 03, ..., FF`? So that effectively each letter encodes 255 combinations (0 - 255 in base 10) – AlekseyHoffman Dec 04 '20 at 13:14
1

@AlekseyHoffman It already does that; you can try by using alphabet `[a+b for a in '0123456789ABCDEF' for b in '0123456789ABCDEF']` which is `['00', '01', '02', '03', ..., 'FE', 'FF']` – Stef Dec 04 '20 at 13:24
@AlekseyHoffman But as I said, I think the line `result += [alphabet[num % alphabet_length]]` in this answer is wrong and you should instead use the line from your original code. You can test whether it works well or not by trying to convert a number in base 10 using the alphabet `[' 0', ' 1', ' 2', ..., ' 9']`. – Stef Dec 04 '20 at 13:26
and then `for x in range(1000): y = ''.join(v2r(x, [' 0', ..., ' 9']).split()); if y != str(x): print(y, x)` – Stef Dec 04 '20 at 13:34
@Stef you're completely right, I'm editing straight away – m.raynal Dec 04 '20 at 13:34
1

@Stef hmm. guys, it seems like characters don't actually encode 2 bits. I just explained it completely wrong. What I actually need is it to encode 2 bits not just use 2 character as a letter, so that if we have an alphabet of 256 characters long `['00', '01', '02', '03', ... 'FF']` when we encode number 15 I want it actually to return `0F` and for 16 return `10`, and for 256 return `FF` and only when it actually fills the 2 bits, for the number 257 it should overflow and return 4bits: `00 00`. – AlekseyHoffman Dec 04 '20 at 13:43
reedited :) it should do what you want it to do now – m.raynal Dec 04 '20 at 13:48
@m.raynal thank you, I generated the alphabet in a similar way, but as I explained in the previous comment, I explained the problem incorrectly this algorithm doesn't fill the bits. For number `15` it returns `0F 15` instead of `0F`, it doesn't actually use 256 combinations per letter, it should only return 4 bits `00 00` for the number 256 (or 257?) when it uses all the combinations of 2 bits `['00', '01', '02', '03', ... 'FF']` and overflows to 4 bits – AlekseyHoffman Dec 04 '20 at 14:00
1

256 should be `'0100'` and 255 should be `'FF'`. Presumably this is already the case with @m.raynal 's code. – Stef Dec 04 '20 at 14:02

score 1 · Answer 2 · answered Dec 04 '20 at 13:06

I sugget to remove 'x' before starting the calucltion, therefore add as a first line of your function

alphabet=alphabet.replace('x','')

so your functions will work with both systems 'FFF' or 'xFxFxF'.

def v2r(num, alphabet):
  """Convert base 10 number into a string of a custom base (alphabet)."""
  alphabet=alphabet.replace('x','') ### added
  alphabet_length = len(alphabet)
  result = ''
  while num > 0:
    result = alphabet[num % alphabet_length] + result
    num  = num // alphabet_length
  return result


def r2v(data, alphabet):
  """Convert string of a custom base (alphabet) back into base 10 number."""
  alphabet=alphabet.replace('x','') ### added
  alphabet_length = len(alphabet)
  num = 0
  for char in data:
    num = alphabet_length * num + alphabet[:alphabet_length].index(char)
  return num

How to convert a string into a custom base using 2 alphabet chars per letter

2 Answers2