2

The problem I'm currently facing is as follows. I have a set of words, and want to construct a grammatically correct phrase/sentence out of them, if at all possible. What I have tried so far is:

  • From the reference text corpus calculate an average position of each word in a sentence;
  • Using this, sort words in set, and separate with space.

The problem with this approach is that most of the time it produces bizarre phrases that make no sense. Is there any way to accomplish this, maybe using techniques (assuming, I'm only working with English)?

George
  • 8,368
  • 12
  • 65
  • 106
  • do you just have a bag of words or a text to generate more text from? also, what do you mean by _meaningful_ or by _phrases that make no sense_? Take a look at [this other question](http://stackoverflow.com/questions/18391602/what-does-generate-do-when-using-nltk-in-python) that touches on generating text from a source/seed text using n-grams with Python's NLTK. ([This project](http://pdos.csail.mit.edu/scigen/) deals with this at an academic level.) – arturomp Aug 28 '13 at 15:50
  • 1
    Do you just want grammatically correct phrases? Is "colorless green ideas sleep furiously" a meaningful sentence? – Kevin Aug 28 '13 at 15:56
  • @amp I have bags of words, want to generate grammatically correct phrases from each bag, would be desirable to use all the words in the bag, the size of the bag is less than 10 words. thanks for the links, will take a look. – George Aug 28 '13 at 17:50
  • @Kevin yes, grammatically correct phrases will be enough. "colorless green ideas sleep furiously" would be nice. – George Aug 28 '13 at 17:51

3 Answers3

1

You can use a ngram model to generate text. Maybe this is of help: http://www.uspleste.usp.br/ivandre/papers/improvedTextGenNgramStat.pdf

A common approach would be to get all 3grams from a corpus and then use probabilities to generate text.

bogs
  • 2,286
  • 18
  • 22
0

You can look in this example of a Markov chain: http://phpir.com/text-generation

Micromega
  • 12,486
  • 7
  • 35
  • 72
0

If you only have the bag of words, I think you need to

  1. Look up all the possible tags for each word
  2. Combine them in grammatical/syntactically valid ways

However, this will not give you necessarily meaningul sentences. They will likely be odd, although perhaps not if your bag of words is very constrained, as it seems to be the case.

If you have a corpus (which I missed the first time I read your question), then you should use it along with something like NLTK's generate() function, which uses n-grams to generate text.

Community
  • 1
  • 1
arturomp
  • 28,790
  • 10
  • 43
  • 72