10

I am working on a program that needs to create a sentence that is grammatically correct from a given set of words. Here I will be passing an input of a list of strings to the program and my output should be a meaningful sentence made with those words, and a few other words that are necessary. Eg.

Input: {'You' , 'House' , 'Beautiful'}
Output: 'Your house is beautiful' (or) 'you house is beautiful' 
Input: {'Father' , 'Love' , 'Child'}
Output: 'The father loves the child'

How do I implement this with NLTK and(or) Machine Learning?

Any suggestions as to how I should go about this? I'm ready to even the most wildest ideas. Thanks! :)

CodeBender
  • 35,668
  • 12
  • 125
  • 132
  • 1
    The thing you are doing, is a holy Grail of developing AI :) I also tried this, and all I could think of, is linking each word with other in the list of all english words. The linking would be working like synaps in brain. When app see two words, or even one - it checks all links to other words. The link has number meaning how strong the link is. Well the problem with linking is you need to make these links. And to do that you can for instance connect the app to WIKI and analize all articles, and that is how create the links. – Jerry Switalski Dec 23 '15 at 12:47
  • https://github.com/karpathy/char-rnn – alvas Dec 23 '15 at 16:50
  • 1
    MontyNLGenerator is quite interesting for basic sentances, if you can feed it carefully, with a spoon. http://alumni.media.mit.edu/~hugo/montylingua/doc/MontyLingua.html – Eugene Oct 12 '16 at 21:02

1 Answers1

8

In this case you can apply an n-gram model. The idea is that a sentence

I like NLP very much.

gets the following 3-grams:

  1. <s> I like
  2. I like NLP
  3. like NLP very
  4. NLP very much
  5. very much </s>

Then you think of it as a probability model P(word3 | word1 word2).

So your work would be:

  1. Get a lot of data of n words after each other (e.g. I think https://books.google.com/ngrams has a download option)
  2. For a given set of words, find all n-grams which contain only those words
  3. Find the most likely combination.

Please note:

  • n should be at least 3
  • the bigger n gets, the more likely it gets that you have to "back off" as you don't have data (but the n-gram might exist and make sense)
  • even n=5 is already VERY much data
Martin Thoma
  • 124,992
  • 159
  • 614
  • 958