I'm preparing a database analysis from the website:
https://www.kaggle.com/c/predicting-loan-default/data
My variable emp_length takes about 3000 different values. Some values are the same or have the same keyword (for example account, accountant, accounting, account specialist, acct.). Some words contain errors or are shortcuts. I want to decrease the values to simplify the names and encode as numeric values. I tried to find keywords with text mining in R, but I'm not convinced that this is the right way. Does anyone have any idea for this?