2

I am trying some preprocessing hence words like dont etc. want to simply to do not so that its algo works better. I checked nltk didnt find something handy. I could use crude method of lookup but the issue will be the one used with proper noun like Jon's etc. Please suggest

The earlier question Expanding English language contractions in Python

doesnt have good answer for proper noun usage

Community
  • 1
  • 1
Suresh Mali
  • 328
  • 1
  • 6
  • 19
  • Use "crude" replacements suggested in the other question for those that exist and for the proper nouns - Jon's car try the of syntax - the car of Jon - that should not be too hard to do – gkusner Jul 16 '14 at 20:06

2 Answers2

2

You can use available lookup tables for that:

http://en.wikipedia.org/wiki/Wikipedia:List_of_English_contractions

http://grammar.about.com/od/words/a/EnglishContractions.htm

Daniel
  • 5,839
  • 9
  • 46
  • 85
2

I have had to work on this on a related NLP project and I decided to tackle the problem since there didn't seem to be anything here. You can check my expander github repository if you are interested.

It uses POS tagging and named entity recognition (NER) to deal with the nouns additionally to the basic expansions. A disambiguations function is also included to deal with the harder case of ambiguous conractions like 's and so on. The NER tagging is the essential part here. It recognizes any nouns that are names, which I have then replaced with a pronoun to analyze the grammatical context and expand it if adequate.

It takes a long time to run on sentences, but it is my approach on how to tackle this problem and runs fairly well on the test cases included in the code.

For more details please have a look at the other answer on the older question or on the github repository directly.

Yannick
  • 153
  • 8