We'll start with your List<string>
. I'm going to assume the 64-bit runtime. Numbers for the 32-bit runtime are slightly smaller.
The List
itself requires about 32 bytes (allocation overhead, plus internal variables), plus the backing array of strings. The array overhead is 50 bytes, and you need 8 bytes per string for the references. So if you have 100,000 sentences, you'll need at minimum 800,000 bytes for the array.
The strings themselves require something like 26 bytes each, plus two bytes per character. So if your average sentence is 80 characters, you need 186 bytes per string. Multiplies by 100K strings, that's about 18.5 megabytes. Altogether, your list of sentences will take around 20 MB (round number).
If you split the sentences into words, you now have 100,000 List<string>
instances. That's about 5 megabytes just for the List<List<string>>
. If we assume 10 words per sentence, then each sentence's list will require about 80 bytes for the backing array, plus 26 bytes per string (total of about 260 bytes), plus the string data itself (8 chars, or 160 bytes total). So each sentence costs you (again, round numbers) 80 + 260 + 160, or 500 bytes. Multiplied by 100,000 sentences, that's 50 MB.
So, very rough numbers, splitting your sentences into a List<List<string>>
will occupy 55 or 60 megabytes.