If I've got a block of text, in English, what's the best method of clearing away all the "filler" words like "the, it, or, we, us", etc... leaving only viable words to be considered the real, core, content of the text?
I'm brainstorming a way to automatically tie blocks of text together based on how similar they are in keyword composition.
I can't be the first one to imagine this. Is there a popular, effective way this can be accomplished using C#?
Update
I am trying to essentially link one block of text, to n "related" blocks of text, where the primary "content" is so similar that it could be considered additional information to the text it is related to...