Suppose I've datasets in Pandas DataFrame :
Sr.No|query
-----------
1. tiger
2. tigers
3. lion
4. lionx
5. ilion
6. 56tigers
The resultant dataset should contain :
Sr.No|query
-----------
1. tiger
2. tiger
3. lion
4. lion
5. lion
6. tiger
I have no idea how to do it, so if you can give any links/book names with the code that will be preferred. I know it is broad topic and may use nltk and clustering algorithms like kNN. But any kind of help will be appreciated.