5

Is there any software out there which can do the following?

Given an English sentence like

"He likes baked beans",

I change "he" to "I" and the sentence changes to

"I like baked beans"

(note the S)

or

"She has her hair in a ponytail"

I change "she" to "he" and the sentence changes to

"He has his hair in a ponytail".

Similarly, can change the sentence into past tense,

"She had her hair in a ponytail".

Does such software even exist?

Kara
  • 6,115
  • 16
  • 50
  • 57

2 Answers2

2

I don't know of any.

However, you might want to have a look at nltk.org (Natural Language Toolkit) which is a Python library for natural language processing that has many features that could potentially be very helpful, such as POS (part of speech) tagging.

This is, of course, if you would be fine with writing such a software yourself, sorry if it is not relevant to what you want to do.

houbysoft
  • 32,532
  • 24
  • 103
  • 156
1

I don't know of any either, but I'll try to give some suggestions.

  • Snowball can normalize many of the words using the porter stemming system, but the endings are often incorrect. What might be possible though is to use the wordlists from e.g. the Moby CROSSWD.TXT, use snowball to find common roots and guess the tense from the ending (e.g. ends with ed or d might be past tense etc.) PyStemmer has wrappers for python if that's what you use, but I couldn't find any Windows binaries so for my purposes I had to build it myself.

    Bear in mind that this method is error-prone, and that it normalizes e.g. tries and try normalize to tri, and there are many exceptions where this doesn't work. Some implementations (there's one in the nltk I believe as mentioned by houbysoft) have many exceptions pre-programmed in, but the problem is that English is such an irregular language that it fixes the inflection of some words but breaks others.

  • Another way is to parse the WordNet data which I believe has "classes" of words by inflections, and exceptions where words don't fit with the rules. It's a pretty heavy task though, I've tried to parse it using the various man pages and haven't succeeded as yet myself (see http://wordnet.princeton.edu/man/morphy.7WN.html for information on parsing inflections.)

  • You could try parsing spelling data from OpenOffice or something similar as they usually group words together into "classes", this is especially attractive for regional (e.g. Australian/British English etc) although it doesn't tell you which inflection etc each word is in.

Anyway, I hope this helps, I think the nltk library is a good place to start as it has a porter (and various other stemming implementations) and lots of example code.

See also How do I do word Stemming or Lemmatization?.

Community
  • 1
  • 1
cryo
  • 14,219
  • 4
  • 32
  • 35