A friend of mine had an idea to make a speed reading program that displays words one by one (much like currently existing speed reading programs). However, the program would filter out words that aren't completely necessary to the meaning (if you want to skim something).
I have starting to implement this program, but I'm not quite sure on what the algorithm to get rid of "unimportant" words should be.
My idea is to parse the sentence (I'm currently using Stanford Parser) and somehow assign weights based on how important that word is to the sentence's meaning to each word then start removing words with the with the lowest weights. I will continue to do this, check how "different" the original tree and the new tree is. I will continue to remove the word with the lowest weight until the two trees are too different (I will determine some constant via a "calibration" process that each user goes through once). Finally, I will go through each word of the shortened sentence and try to replace it with a simpler or shorter synonym for that word (again while still trying to retain value).
As well, there will be special cases for very common words like "the," "a," and "of."
For example:
"Billy said to Jane, 'Do you want to go out?'"
Would become:
"Billy told Jane 'want go out?'"
This would retain basically all of the meaning of the sentence but shortened it significantly.
Is this a good idea for an algorithm and if so how would I assign the weights, what tree comparison algorithm should I use, and is inserting the synonyms done in a good place (i.e. should it be done before I try to remove any words)?