I am new to topic modeling. After doing tokenizing using NLTK, for example I have following tokens:
'1-in', '1-joerg', '1-justine', '1-lleyton', '1-million', '1-nil', '1of','00pm-ish', '01.41', '01.57','0-40', '0-40f',
I believe they are meaningless and can not help me in the rest of my process. Is it correct? If so, is there anyone who has an idea about regular expression or ... that should be used to remove these tokens from my token list(they are so different and I could not think of a regexp for this purpose)?