1

I am working on implementing dynamic version of RLE compression.

On this version I only insert characters count in the RLE Code if it's more than 1, other than that I keep it as its original state.

The problem I encounter here when trying to encode numbers in the original input text.

Is there any way to encode them like alphabet characters without losing efficiency of the RLE compression ratio.

Example

text = "aabccc 123 ghe dd12 Goooooal"

I want to encode it like:

2ab3c 123 ghe 2d12 G5oal

But I need some kind of encoding for 123 and 12 in RLE code.

Oghli
  • 2,200
  • 1
  • 15
  • 37
  • You could use some special character to separate the number from the character then repeat anything you need to escape "1" times. – mousetail Sep 08 '22 at 10:57
  • 1
    If your alphabet is 7 bits ASCII, all the extended values are free for use. –  Sep 08 '22 at 10:59
  • 1
    You need an escape character to distinguish between your repeat-number and numbers in the original text. https://en.wikipedia.org/wiki/Escape_character – MrSmith42 Sep 08 '22 at 11:12
  • @MrSmith42 so the RLE code with escape character will be now look like: `2ab3c \1\2\3 ghe 2d\1\2 G5oal`. – Oghli Sep 08 '22 at 11:17
  • @MrSmith42 In this case it will be inefficient approach because it will increase number of characters by 1 for each escape character of the digit. – Oghli Sep 08 '22 at 11:23
  • @mousetail It could work but in this case you should distinguish between special character in RLE code and input or remove this special character from the input. – Oghli Sep 08 '22 at 11:36
  • @Oghli No, you can easily escape them by repeating them 1ce – mousetail Sep 08 '22 at 11:38
  • @mousetail yes it could work. but there are some cases when this special character repeated specific number times in input text. – Oghli Sep 08 '22 at 11:45
  • Super easy then. Repeat them some other number of times then 1 in those cases – mousetail Sep 08 '22 at 11:46
  • @mousetail in this case it will be inefficient because it increases number of encoding characters in RLE code -> meaning more size. – Oghli Sep 08 '22 at 11:49
  • No it will be exactly as efficient as without escaping in those cases. Only when they are repeated once will it increase the size. – mousetail Sep 08 '22 at 11:49
  • `2ab3c \1\2\3 ghe 2d\1\2 G5oal. ` is the naive and maybe not space efficient version to escape it. You could also instead escape your count-values instead `\2ab\3c 123 ghe \2d12 G\5oal. ` This way you could decide only to encode repetitions of at least 4 to ensure that your encoding actually compresses the String. `aabccc 123 ghe dd12 G\5oal. ` – MrSmith42 Sep 08 '22 at 12:22
  • 1
    Perhaps the encoded version might be `2ab3c \123\ ghe 2d\12\ G5oa` where the first instance of `\\` shifts into numeric mode, the next instance unshifts, etc. Whether this presents an unacceptable increase in the length of the encoded string depends on how many numbers there are in the original string, and how long they are. I think this may be what @mousetail originally proposed. – High Performance Mark Sep 08 '22 at 12:37
  • @HighPerformanceMark yes I think this approach may work in my case. I don't think that what mousetail suggested in his comment. – Oghli Sep 08 '22 at 13:06

0 Answers0