1

I'm looking to do the opposite to what is described here: Tools for text simplification (Java) Finding meaningful sub-sentences from a sentence

That is, take two simple sentences and combine them as a compound sentence.

Are there any algorithms to do this?

Community
  • 1
  • 1
Ezequiel
  • 668
  • 2
  • 9
  • 25
  • Why do you need to compound sentences? What is the domain (e.g. provide examples of simple sentences and desired compounds)? – Nikita Astrakhantsev May 20 '15 at 21:28
  • I'm curious – why do you want to do that? I haven't heard of any approaches to combine sentences, although some (but rather few) attempts to text summarisation try to construct sentences from bits and pieces (phrases or n-grams) found throughout a document. Of course, you can just concatenate sentence with 'and', but if you strive for a more elaborate approach using subordinate clauses with the appropriate conjunction ("because", "while", "although"...) or even nesting with relative clauses, this is going to be difficult. – lenz May 20 '15 at 21:36

1 Answers1

1

I'm particularly sure that you will not be able to compound sentences like in the example from the linked question (John played golf. John was the CEO of a company. -> John, who was the CEO of a company, played golf), because it requires such language understanding that is too far from now.

So, it seems that the best option is to bluntly replace dot by comma and concatenate simple sentences (if you have to choose sentences to be compounded from text, you can try simple heuristics like approximating semantic similarity by number of common words or tools like those based on WordNet). I guess, in most cases human readers can infer missed conjunction from the context.

Of course, you could develop more sophisticated solutions, but it requires either narrow domain (e.g. all sentences share very similar structure), or tools that can determine relations between sentences, e.g. relationship of cause and effect. I'm not aware of such tools and doubt in their existence, because this level (sentences and phrases) are much more diverse and sparse than the level of words and collocations.

Nikita Astrakhantsev
  • 4,701
  • 1
  • 15
  • 26