A lot of the algorithms that you are describing in this question are called entropy coders (Shannon-Fano, Huffman, arithmetic, etc.). Entropy coders are used to compress sequences of symbols (often bytes), where some symbols are much more frequent than others. Simple entropy coding of symbols (letters) for compressing natural language will only yield about a 2:1 compression.
Instead, popular modern lossless compression techniques for text include methods like LZ77, LZW, and BWT. Loosely speaking, the LZ family involves building up a dictionary of recurring short symbol sequences (we'll call them "words") and then uses pointers to reference those words. Some of the implementations of LZ like LZ77 and LZW can be fairly simple to code up but probably do not yield the highest compression ratios. See for example this video: https://www.youtube.com/watch?v=j2HSd3HCpDs. On the other end of the spectrum, LZMA2, is a relatively more complicated variant with a higher compression ratio.
The Burrows-Wheeler transform (BWT) provides a clever alternative to the dictionary methods. I'll refer you to the Wikipedia article, https://en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transform
In a nutshell, though, it produces a (invertible) permutation of the original sequence of bytes that can often be compressed very effectively by run-length encoding followed by an entropy coder.
If I had to code a compression technique from scratch, for simplicity, I'd probably go with LZW or LZ77.