1

I have data as below:

Carepnter
Carpentor
Labourer
Labor
Labour
Housewife
House Wife
housewife.

I want to clean data and rectify the spelling mistakes but not manually because its a huge data. Due to spelling mistakes these 50/60 occupations have become around 2000.

Husnain Iqbal
  • 89
  • 1
  • 10

1 Answers1

0

You would have to find strings that are close to the actual occupation, for instance carpenter. Then you can try to find the closest n-matches to it.

Another question on here also dealt with finding similar strings (Python: find closest string (from a list) to another string) and the solutions from the answers for you could be either:

  1. difflib.get_close_matches

  2. Spelling corrector

Kim Tang
  • 2,330
  • 2
  • 9
  • 34