0

The task is to group a given string of text into sections according to various restrictions set on each section. Let's say we have a string S that is "Lorem ipsum dolorem." We also have 3 sections. Each section has restrictions that can place a restriction on the amount of the text that can be in the section. These restrictions can be specified in terms of number of characters or number of words. For example, the first section can have a minimum of five characters with a maximum of ten characters. The second section can have a minimum of one word with a maximum of five words, with each word being between 2 and 10 characters. The third section can have the same restriction as the first.

We need to use all available text, or else there is no grouping solution. Words cannot be separated (so we cannot break apart the a word into multiple part when grouping.) Solutions are better if we keep sentences together in the grouping, all other things equal.

What is the most efficient way to group the text?

tabdulla
  • 505
  • 1
  • 5
  • 14

1 Answers1

0

If you only count characters/numbers and words, it is a case for regular expressions: http://en.wikipedia.org/wiki/Regular_expressions

EDIT

E.g., consider the following:

 sed -E -e 's/([a-z]{2,10}) (([a-z]{2,4} ){1,2})([a-z]{2,10})/G:\1 G:\2
 G:\4/'

If one applies this to "aaa bb bbbb ccccc", one gets:

 G:aaa G:bb bbbb  G:ccccc
Matthias
  • 8,018
  • 2
  • 27
  • 53
  • I don't think so. There can be a range of words or characters in each sections, and some solutions are more preferable than others. For example, it is preferable to group sentences, all other things equal. – tabdulla Apr 05 '12 at 06:06
  • I do not understand, what you mean by the "a range of words or characters in each sections". Each regex can cover ranges, doesn't it? And regarding the preferences: Test for the nices first, if no match, use your second choice, etc. However, you can do it with a regex tool like awk and assign different values for different kind of matches. – Matthias Apr 05 '12 at 06:11
  • I put an example in the answer. Is it that, what you mean, or do I miss the point of your question? – Matthias Apr 05 '12 at 06:38
  • That makes sense. Can it give all possible solutions? – tabdulla Apr 05 '12 at 06:49
  • Not with sed. However, it is possible, see http://stackoverflow.com/questions/6643730/producing-all-possible-matches-of-a-regular-expression – Matthias Apr 05 '12 at 06:54