Split your available memory into two halves. Use one as a 4-bit counting Bloom filter and the other half as a fixed size hash table with counts. The role of the counting Bloom filter is to filter out rarely occuring words with high memory efficiency.
Check your 1 TB of words against the initially empty Bloom filter; if a word is already in and all buckets are set to the maximum value of 15 (this may be partly or wholly a false positive), pass it through. If it is not, add it.
Words that passed through get counted; for a majority of words, this is every time but the first 15 times you see them. A small percentage will start to get counted even sooner, bringing a potential inaccuracy of up to 15 occurrences per word into your results. That's a limitation of Bloom filters.
When the first pass is over, you can correct the inaccuracy with a second pass if desired. Deallocate the Bloom filter, deallocate also all counts that are not within 15 occurrences behind the tenth most frequent word. Go through the input again, this time accurately counting words (using a separate hash table), but ignoring words that have not been retained as approximate winners from the first pass.
Notes
The hash table used in the first pass may theoretically overflow with certain statistical distributions of the input (e.g., each word exactly 16 times) or with extremely limited RAM. It is up to you to calculate or try out whether this can realistically happen to you or not.
Note also that the bucket width (4 bits in the above description) is just a parameter of the construction. A non-counting Bloom filter (bucket width of 1) would filter out most unique words nicely, but do nothing to filter out other very rarely occuring words. A wider bucket size will be more prone to cross-talk between words (because there will be fewer buckets), and it will also reduce guaranteed accuracy level after the first pass (15 occurrences in the case of 4 bits). But these downsides will be quantitatively insignificant until some point, while I'm imagining the more aggresive filtering effect as completely crucial for keeping the hash table in sub-gigabyte sizes with non-repetitive natural language data.
As for the order of magnitude memory needs of the Bloom filter itself; these people are working way below 100 MB, and with a much more challenging application ("full" n-gram statistics, rather than threshold 1-gram statistics).