1

How can I generate a numeric score for a string, which I can later user to order things alphabetically?

(I'd like to add objectIds to a redis sorted set based on a name property. This needs a numeric score. My list-of-things might be too big to sort all at once, hence wanting to score each item individually)

Words earlier in an alphabetic list should have a lower score, with 'a' = 0.

My naive approach so far; (letter alphabetic position from Replace a letter with its alphabet position )

function alphaScoreString(inputString) {
    let score = 0

    inputString
        .trim()
        .toLowerCase()
        .split('')
        .map((letter, index) => {
            const letterNumber = parseInt(letter, 36) - 10
            if (letterNumber >= 0) {
                score += letterNumber / (index + 1) 
            }
        })

    return score * 1000
}

This does not work, as

alphaScoreString('bb')
1500
alphaScoreString('bc')
2000
alphaScoreString('bbz')
9833.333333333334

You can see that 'bbz' has a higher score than 'bc', whereas it should be lower, as 'bbz' would come before 'bc' in an alphabetical list.

Aluan Haddad
  • 29,886
  • 8
  • 72
  • 84
RYFN
  • 2,939
  • 1
  • 29
  • 40
  • 1
    This is a very interesting question. Just to clarify - are the strings limited in length? If not, then it would be very difficult to implement such a function since, for example, the string 'a' * n would always have to produce a score less than the string 'b', no matter how big n is. – Tomer Ariel Jun 09 '22 at 17:10
  • Yeah it's a tricky one! The strings technically have no limit, but in this case they're last names, so we could assume less than... 25? I don't think anyone would notice if some longer names were in the wrong order due to the 26th letter. – RYFN Jun 09 '22 at 19:06
  • If words earlier in an alphabetic list should have a lower score, and the point is to later order the words by that score...**why not just sort the words normally**? – Wyck Jun 09 '22 at 19:24
  • Because later I will only have the score and an ID for the object, not the word itself. – RYFN Jun 09 '22 at 19:26

2 Answers2

2

You can convert each character to its unicode (and ensure that every character is 4 digits by padding the string. e.g. "H" = 72 but is padded to 0072: Doing a word by word comparison, you can still determine the 'alphabetical order' of each string:

var instring = "Hello World";
var output = "";
for(i=0; i<instring.length;i++){
  const newchar = String(instring.charCodeAt(i)).padStart(4, '0');
  output = output.concat(newchar)
  console.log(output);
}
Donald Koscheka
  • 336
  • 1
  • 6
  • Interesting idea! But I need the output to be type of number, whereas this is of type string. – RYFN Jun 09 '22 at 19:27
  • 1
    The issue with using a number (assuming longint here) is that you can never get a unique identifier for each string. There are more strings than there are 64-bit integers. If you could limit the number to 64 bits, this approach would work. – Donald Koscheka Jun 10 '22 at 14:31
  • I'm totally ok with duplicates here, I'm using this to generate an ordered set of `score: object-id`, but to use the score in a redis sorted set, it needs to be numeric – RYFN Jun 10 '22 at 14:47
  • 1
    Could you truncate the 'numerical string' to the first n (e.g. n=16) digits and convert it to a number? – Donald Koscheka Jun 10 '22 at 14:50
1
Answer writen in python.

char_codex = {'a':0.01, 'b':0.02, 'c':0.03, 'd':0.04, 'e':0.05, 'f':0.06,
              'g':0.07, 'h':0.08, 'i':0.09, 'j':0.10, 'k':0.11, 'l':0.12, 
              'm':0.13, 'n':0.14, 'o':0.15, 'p':0.16, 'q':0.17, 'r':0.18, 
              's':0.19, 't':0.20, 'u':0.21, 'v':0.22, 'w':0.23, 'x':0.24,
              'y':0.25, 'z':0.26}

def alphabetic_score(word):
  bitwiseshift = '1'
  scores = [0.00] * len(word)
  for index, letter in enumerate(word.lower()):
    if index is 0:
      scores[index] = char_codex[letter]
    else:
      bitwiseshift = bitwiseshift+'00'
      scores[index] = char_codex[letter]/int(bitwiseshift)
  return sum(scores)
Akuha
  • 11
  • 1