0

I need to convert strings to 26-ary and then be able to convert them back.

My current code is:

(define (26-ary-word s)
  (let ([len (string-length s)])
        (let f ([n 0]
                [acc (+
                      (- (char->integer (string-ref s 0)) 97)
                      1)]) ; adding 1 so that all strings start with 'b'
          (if (< n len)
              (f (add1 n) (+ (* acc 26) (- (char->integer (string-ref s n)) 97)))
              acc))))

(define (word-ary-26 n)
  (let f ([n (/ (- n (modulo n 26)) 26)]
          [acc (cons (integer->char (+ (modulo n 26) 97)) '())])
    (if (> n 0)
        (f (/ (- n (modulo n 26)) 26) (cons (integer->char (+ (modulo n 26) 97)) acc))
        (list->string (cdr acc))))) ; remove "b" from front of string

I add 1 to acc to start with, and remove the "b" at the end. This is because multiplying "a" - 97 by 26 is still 0.

This is already ugly, but it doesn't even work. "z" is recorded as "701" when it's in the first position (26^2), which is translated back as "az".

I can add another if clause detecting if the first letter is z, but that's really ugly. Is there any way to do this that sidesteps this issue?

          (if (and (= n 0) (= acc 26))
              (f (add1 n) 51)
              (f (add1 n) (+ (* acc 26) (- (char->integer (string-ref s n)) 97))))

This is the ugly edge case handling code I've had to use.

Alexis King
  • 43,109
  • 15
  • 131
  • 205
JimmyM
  • 360
  • 3
  • 20

1 Answers1

1

Honestly, I'm not entirely sure what your code is doing, but either way, it's far more complicated than it needs to be. Converting a base-26 string to an integer is quite straightforward just by using some higher-order constructs:

; (char-in #\a #\z) -> (integer-in 0 25)
(define (base-26-char->integer c)
  (- (char->integer c) (char->integer #\a)))

; #rx"[a-z]+" -> integer?
(define (base-26-string->integer s)
  (let ([digits (map base-26-char->integer (string->list s))])
    (for/fold ([sum 0])
              ([digit (in-list digits)])
      (+ (* sum 26) digit))))

By breaking the problem into two functions, one that converts individual characters and one that converts an entire string, we can easily make use of Racket's string->list function to simplify the implementation.

The inverse conversion is actually slightly trickier to make elegant using purely functional constructs, but it becomes extremely trivial with an extra helper function that "explodes" an integer into its digits in any base.

; integer? [integer?] -> (listof integer?)
(define (integer->digits i [base 10])
  (reverse
   (let loop ([i i])
     (if (zero? i) empty
         (let-values ([(q r) (quotient/remainder i base)])
           (cons r (loop q)))))))

Then the implementation of the string-generating functions becomes obvious.

; (integer-in 0 25) -> (char-in #\a #\z)
(define (integer->base-26-char i)
  (integer->char (+ i (char->integer #\a))))

; integer? -> #rx"[a-z]+"
(define (integer->base-26-string i)
  (list->string (map integer->base-26-char (integer->digits i 26))))
Alexis King
  • 43,109
  • 15
  • 131
  • 205
  • Hi, Thanks for the answer. Your code is much cleaner than mine, which lumps everything into one function so I have upvoted. Unfortunately, I can't accept your answer, because although it improves my code it does not answer my question - how to handle "A". To see proof of this, feed the string "aardvark" through both the initial function and the function that converts back to string form. I see the result "rdvark", which was the initial problem that uglified what should have been a simple algorithm. The problem is that (* x 0) = 0, so you need a guard character at front. – JimmyM Mar 08 '15 at 08:50
  • This is what leads to the additional code to handle the case when the first letter is "z". – JimmyM Mar 08 '15 at 08:50
  • @JimmyM In base-26, a leading `a` is like a leading `0` in base-10. It has no semantic meaning. With no specification about how long the original input was, there is absolutely no way to know if there were leading zeroes or not once converted to a number (which has no information about its base, it's just a numeric value). – Alexis King Mar 08 '15 at 08:52
  • That's why I needed to add the "b" to be able to convert the strings back, and the edge-case-handling code for "z". In this case, I need to convert to and from 26-ary because I'm creating a trie/binary tree hybrid data structure. I was hoping there'd be a better way - I will accept your answer on the implicit understanding that it's saying no, there isn't a better way (But here are ways your code should be cleaner). EDIT: Although I'm keeping 97 instead of (char->integer #\a) in this one file because they are the same, and this is a personal project that no-one else will see. – JimmyM Mar 08 '15 at 08:55