After lemmatization of text I have a list of lemmas. For each element of this list I would like to figure out is it a word ("cat", "dog", "go", "red") or non-word (".","rand_yh4jhdf","'''","100x200","42,44,46","22:00","xxx___BATMAN___xxx"). Is this problem have a simple solution? How can I differ word vs non-word with Python and NLTK?
UPD. (for the question what a word is) I want to clear my list from total garbage. Remove what is totally not a word. Don't touch complicated edge cases.