I am trying to remove sentences from corpus which are longer(>25 tokens) and shorter(<4 tokens) and also remove sentence that contains rare words that appears less than 8 times. I am trying to remove it but I get error message or empty list every time I tried. corpus is Brown corpus.
lens = [w for w in corpus.sents() if len(w)>=25 and len(w)<= 4]
I get empty list as output
out: []
I am also not sure how to include rare word in this list comprehension. Do I have to convert into FreqDist??
how to remove sentences that are very long, very short and have rare words ? I am confused. Does anyone know and can explain how to do it?? it will be much appreciated :)