13

I need to split a string into chunks according to a specific size. I cannot break words between chunks, so I need to catch when adding the next word will go over chunk size and start the next one (it's ok if a chunk is less than specified size).

Here is my working code, but I would like to find a more elegant way to do this.

def split_into_chunks_by_size(chunk_size, string)
  string_split_into_chunks = [""]
  string.split(" ").each do |word|
    if (string_split_into_chunks[-1].length + 1 + word.length > chunk_size)
      string_split_into_chunks << word
    else
      string_split_into_chunks[-1] << " " + word
    end
  end
  return string_split_into_chunks
end
sawa
  • 165,429
  • 45
  • 277
  • 381
psychickita
  • 141
  • 1
  • 5

2 Answers2

24

How about:

str = "split a string into chunks according to a specific size. Seems easy enough, but here is the catch: I cannot be breaking words between chunks, so I need to catch when adding the next word will go over chunk size and start the next one (its ok if a chunk is less than specified size)." 
str.scan(/.{1,25}\W/)
=> ["split a string into ", "chunks according to a ", "specific size. Seems easy ", "enough, but here is the ", "catch: I cannot be ", "breaking words between ", "chunks, so I need to ", "catch when adding the ", "next word will go over ", "chunk size and start the ", "next one (its ok if a ", "chunk is less than ", "specified size)."]

Update after @sawa comment:

str.scan(/.{1,25}\b|.{1,25}/).map(&:strip)

This is better as it doesn't require a string to end with \W

And it will handle words longer than specified length. Actually it will split them, but I assume this is desired behaviour

Yuri Golobokov
  • 1,829
  • 12
  • 11
  • works great, thanks a lot! one more thing: can we trim trailing spaces right in here? – psychickita Mar 14 '13 at 05:23
  • 1
    This is close to good, but it always requires a `\W` character at the end. In your particular example, it worked because of the `)` and `.` at the end, but without it, it won't work. Each chunk also necessarily ends with a `\W` character when it does not have to. – sawa Mar 14 '13 at 06:20
  • Thank you for pointing this out. Updated the answer to solve this problems – Yuri Golobokov Mar 14 '13 at 07:10
  • 1
    I ended up using `/.{1,25}\z|.{1,25}\s|.{1,25}/`. The `/.{1,25}\z` being a "catch all" at the end of the string. – XML Slayer Apr 17 '18 at 15:49
5

@Yuriy, your alternation looks like trouble. How about:

str.scan /\S.{1,24}(?!\S)/
#=> ["split a string into", "chunks according to a", "specific size. Seems easy", "enough, but here is the", "catch: I cannot be", "breaking words between", "chunks, so I need to", "catch when adding the", "next word will go over", "chunk size and Start the", "next one (its ok if a", "chunk is less than", "specified size)."]
pguardiario
  • 53,827
  • 19
  • 119
  • 159