0

I have a 400 character string made up of a variety of 40 possible characters. The 40 characters are a-z A-N. In this example I have:

nnnnnnnnnnnnnnnnnnnnnnnnunnnnnnnnnnnnuuunnnuxunnnnnnnnnnuxddnnnuxduuuuuunnnnuxddnnnuxddddddunnnnuxddnuuuxdddddduuuuuuxddnnnuxduuudducuccuxddnnnuxdunodduiuiiuxddnnnuxduoodducuccuxddnnnuxdddddduiuiiuxddnnnuxdddddducuccuxddnnnuxdddddduiuiiuxddnnnuxduuudducuccuxddnuuuxdunodduiuiiuxddnnnuxduoodducuccuxddnnnuxdddddduuuuuuxddnnnuxxxxxxxunnnnuxxxnnnnuuuuuuuunnnnnuuunnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

I need to have the compressed version be made out of URL-safe characters. I am using a-z A-N and 0-9 - . _ ~ ( ) ' ! * : @ . ; to represent how many times characters in the original string repeat up to 23 times. If a character repeats more than 23 times, the character and numerical representation is added onto the prior compressed string.

After compressing, the resulting string is 246 characters:

n;un-u1n1uxun8uxd0n1uxdu4n2uxd0n1uxd4un2uxd0nu1xd4u4xd0n1uxdu1d0ucuc0uxd0n1uxdunod0uiui0uxd0n1uxduo0d0ucuc0uxd0n1uxd4uiui0uxd0n1uxd4ucuc0uxd0n1uxd4uiui0uxd0n1uxdu1d0ucuc0uxd0nu1xdunod0uiui0uxd0n1uxduo0d0ucuc0uxd0n1uxd4u4xd0n1ux5un2ux1n2u6n3u1n;n(

The Javascript function to compress the original string is:

// 40 single URL-safe characters 
let singleCharList = [
"a",
"b",
"c",
"d",
"e",
"f",
"g",
"h",
"i",
"j",
"k",
"l",
"m",
"n",
"o",
"p",
"q",
"r",
"s",
"t",
"u",
"v",
"w",
"x",
"y",
"z",
"A",
"B",
"C",
"D",
"E",
"F",
"G",
"H",
"I",
"J",
"K",
"L",
"M",
"N",
]

// 23 single URL-safe characters for counting (0 is actually 1)
let singleCharCountList = [
"0",
"1",
"2",
"3",
"4",
"5",
"6",
"7",
"8",
"9",
"-",
".",
"_",
"~",
"(",
")",
"'",
"!",
"*",
":",
"@",
",",
";",
]

function compressString(stringToCompress) {
  var compressedString = ""
  var lastChar = ""
  var inARowCount = 0
  var maxInARowCount = singleCharCountList.length
  for (i = 0; i < stringToCompress.length + 1; ++i) {
let currentChar = stringToCompress.charAt(i)
if (currentChar === lastChar && inARowCount < maxInARowCount) {
  inARowCount += 1
} else {
  if (inARowCount > 0) {
    singleCharCount = singleCharCountList[inARowCount - 1]
    compressedString = compressedString + lastChar + singleCharCount
  } else {
    compressedString = compressedString + lastChar
  }
  inARowCount = 0
}
lastChar = currentChar
  }
  return compressedString
}

// To inflate the string back to its original 400 characters, I have: 

function inflateString(stringToInflate) {
  var inflatedString = ""
  var lastChar = ""
  for (i = 0; i < stringToInflate.length; ++i) {
let currentChar = stringToInflate.charAt(i)
// currentChar is part of singleCharCountList 
// and lastChar is part of singleCharList
if (singleCharCountList.indexOf(currentChar) > -1 &&
  singleCharList.indexOf(lastChar) > -1) {
  var loop = singleCharCountList.indexOf(currentChar) + 1
  for (var j = 0; j < loop; ++j) {
    inflatedString = inflatedString + lastChar
  }
}
// currentChar is part of singleCharList
// and lastChar is part of singleCharCountList
else if (singleCharList.indexOf(currentChar) > -1 &&
  singleCharCountList.indexOf(lastChar) > -1) {
  inflatedString = inflatedString + currentChar
}
// currentChar is part of singleCharList
// and lastChar is part of singleCharList
else if (singleCharList.indexOf(currentChar) > -1 &&
  singleCharList.indexOf(lastChar) > -1) {
  inflatedString = inflatedString + currentChar
}
// current char is part of singleCharList
// and lastChar is not part of singleCharList or singleCharCountList
else if (singleCharList.indexOf(currentChar) > -1 &&
  singleCharList.indexOf(lastChar) == -1 &&
  singleCharCountList.indexOf(lastChar) == -1) {
  inflatedString = inflatedString + currentChar
}
lastChar = currentChar
  }
  return inflatedString
}


const str = 'nnnnnnnnnnnnnnnnnnnnnnnnunnnnnnnnnnnnuuunnnuxunnnnnnnnnnuxddnnnuxduuuuuunnnnuxddnnnuxddddddunnnnuxddnuuuxdddddduuuuuuxddnnnuxduuudducuccuxddnnnuxdunodduiuiiuxddnnnuxduoodducuccuxddnnnuxdddddduiuiiuxddnnnuxdddddducuccuxddnnnuxdddddduiuiiuxddnnnuxduuudducuccuxddnuuuxdunodduiuiiuxddnnnuxduoodducuccuxddnnnuxdddddduuuuuuxddnnnuxxxxxxxunnnnuxxxnnnnuuuuuuuunnnnnuuunnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn';

const compressed = compressString(str);
console.log(compressed.length);
const inflate = inflateString(compressed)
console.log(str.length,inflateString.length)

Can I do better? I'm moderately happy with 246 in my example, but this can vary a lot depending on the original content.

Chewie The Chorkie
  • 4,896
  • 9
  • 46
  • 90

1 Answers1

1

First compress the string using a standard compression tool, such as zlib. Your example compresses from 400 bytes down to 93 with zlib. Then encode the binary result into your restricted set of 75 characters, which will expand it by about 30%, resulting in about 120 bytes. Still significantly compressed from the original 400.

That encoding can be done many ways, but the fastest is likely reverse Huffman coding, where you take the binary to represent a sequence of Huffman codes of a flat probability distribution of 75 symbols. You end up with 53 6-bit codes and 22 7-bit codes. That gives 6.29 bits per symbol, close to the optimal 6.23 bits per symbol.

It is a waste of time to develop your own, substandard, compression method with a weak attempt at run-length coding.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158