I am doing preprocessing of tweets using Python. However, a lot of words used are short forms of other words like luv, kool etc. And also, abbreviations like brb , ttyl etc.
Right now, I can only think of having a huge Hashmap with words as keys and the actual words or expansions as values. Is there any other better way to approach this using NLP ?
NOTE : I know question seems too vague. But please dont report it. I have asked this so that amateurs can benefit from this knowledge
PS : Is there a nicely formatted text list that I can download and use? The links put down are good , but when i copy and paste it - they are not in an easily parsable format
-tags (or take a line with acronym, skip 2 lines, take a line with definition, skip 4 lines and repeat). Even regexp aren't needed there, so it is indeed nicely formatted text.
– Nikita Astrakhantsev Feb 28 '15 at 10:35