The search keyword you're looking for is "dictionary". i.e. data structures that can be used to store a list of words, and test other strings for being present or absent in the dictionary.
Your idea is more compact than storing every word separately, but far less compact than a good data structure like a DAWG. As you note, it isn't obvious how to optimally choose how to overlap your strings. What you're doing is a bit like what a lossless compression scheme (like gzip) would do. If you don't need to check words against your compact dictionary, maybe just use gzip or LZMA to compress a sorted word list. Let their algorithms find the redundancy and represent it compactly.
I looked into dictionaries for a recent SO answer that caught my interest: Memory-constrained external sorting of strings, with duplicates combined&counted, on a critical server (billions of filenames)
For a dictionary that doesn't have to have new words added on the fly, a Directed Acyclic Word Graph is the way to go. You match a string against it by following graph nodes until you either hit a point where there's no edge matching the next character, or you get to the end of your input string and find that the node in the DAWG is marked as being a valid end-of-word. (Rather than simply a substring that is only a prefix of some words). There are algorithms for building these state-machines from a simple array-of-words dictionary in reasonable time.
Your method can only take advantage of redundancy when a whole word is a substring of another word, or end-of-one, start-of-another. A DAWG can take advantage of common substrings everywhere, and is also quite fast to match words against. Probably comparable speed to binary-searching your data structure, esp. if the giant string is too big to fit in the cache. (Once you start exceeding cache size, compactness of the data structure starts to outweigh code complexity for speed.)
Less complex but still efficient is a Trie (or Radix Trie), where common prefixes are merged, but common substrings later in words don't converge again.
If you don't need to modify your DAWG or Trie at all, you can store it efficiently in a single block of memory, rather than dynamically allocating each node. You didn't say why you didn't want to use a Trie, and also didn't acknowledge the existence of the other data structures that do this job better than a plain Trie.