4

The following question was posted by @ruhroe about an hour ago. I was about to post an answer when it was taken down. That's unfortunate, as I thought it was rather interesting. I'm putting it back up in case the OP sees this and also to give others an opportunity to post solutions.

The original question (which I've edited):

The problem is to split a string on some spaces in the string, based on criteria which depend in part on a number given by the user. If that number were, say, 5, each substring would contain either:

  • one word having 5 or more characters or
  • as many consecutive words (separated by spaces) as possible, provided the resulting string has at most 5 characters.

For example, if the string were:

"abcdefg fg hijkl mno pqrs tuv wx yz"

the result would be:

["abcdefg", "fg", "hijkl", "mno", "pqrs", "tuv", "wx yz"]
  • "abcdefg" is on a separate line because it has at least five characters.
  • "fg" is on a separate line because "fg" contains 5 or few characters and when combined with the following word, with a space between them, the resulting string, "fg hijkl", contains more than 5 characters.
  • "hijkl" is on a separate line because it satisfies both criteria.

How can I do that?

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
  • Also note http://stackoverflow.com/questions/2312153/wrapping-text-into-lines-at-word-boundaries, which is not the same but similar. – sawa Feb 20 '15 at 02:14
  • Don't use regex for this - you're asking for a solution that both looks ahead and back at a word or group of words while keeping track of the length of a line. Keep in mind that someone will have to maintain this in the future - whatever regex you come up with is almost surely to be unreadable or maintainable. – matt Feb 20 '15 at 02:28

2 Answers2

3

I believe this does it:

str = "abcdefg fg hijkl e mn pqrs tuv wx yz"

str.scan(/\b(?:\w{5,}|\w[\w\s]{0,3}\w|\w)\b/)
  #=> ["abcdefg", "fg", "hijkl", "e mn", "pqrs", "tuv", "wx yz"] 
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
1

As you iterate through the words in your collection (splitting the original string up into words should be trivial), it seems like there are three possible scenarios:

  1. It's a blank line, and we should insert the current word into the line
  2. It's a non-blank line, and the word can fit
  3. It's a non-blank line, and the word can't fit and it should go into a new line

Something like this should work (note - I haven't tested this much outside of your solution. You'll definitely want to do that):

words.each do |word|
  if line.blank?
    # this is a new line, so start it with the current word
    line << word
  elsif word_can_fit_line?(line, word, length)
    # the word fits, so append it to the current line
    line << " #{word}"
  else
    # the word doesn't fit, so keep this line and start a new one with
    # the current word
    lines << line
    line = word
  end
end

# add the last line and we're done
lines << line

lines

Note that the implementation of word_can_fit_line? should be trivial - you just want to see if the current line length, plus a space, plus the word length, is less than or equal to your desired line length.

matt
  • 9,113
  • 3
  • 44
  • 46