6

I would like to replace spaces with linebreaks (\n) in a pretty long chracter vector in R. However, I don't want to replace every space, but only if the substring exeeds a certain number of characters (n).

Example:

mystring <- "this string is annoyingly long and therefore I would like to insert linebreaks" 

Now I want to insert linebreaks in mystring at every space on the condition that each substring has a length greater than 20 characters (nchar > 20).

Hence, the resulting string is supposed to look like this:

"this string is annoyingly\nlong and therefore I would\nlike to insert linebreaks") 

Linebreaks (\n) were inserted after 25, 26 and 25 characters.

How can I achieve this? Maybe something combining gsub and strsplit?

chamaoskurumi
  • 2,271
  • 2
  • 23
  • 30

1 Answers1

17

You may use .{21,}?\s regex to match any 21 (since nchar > 20) chars or more, but as few as possible, up to the nearest whitespace:

> gsub("(.{21,}?)\\s", "\\1\n", mystring)
[1] "this string is annoyingly\nlong and therefore I would\nlike to insert linebreaks"

Details:

  • (.{21,}?) - Group 1 capturing any 21 chars or more, but as few as possible (as {21,}? is a lazy quantifier)
  • \\s - a whitespace

The replacement contains the backreference to Group 1 to reinsert the text before the whitespace, and the newline char (feel free to add CR, too, if needed).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Why are the comma and the question mark needed here? I believe it would work to match 21 characters up to the space eliminating the need for the comma. I don't understand what the question mark is accomplishing. – jtr13 Oct 01 '19 at 22:59
  • @jtr13 It is explained in the answer: *any 21 chars or more, but as few as possible (as {21,}? is a lazy quantifier)*. `{21,}?` matches more than 21, but as few as possible, that is, `.{21,}?` will grab any 21 chars and then any chars up to the first whitespace. – Wiktor Stribiżew Oct 01 '19 at 23:23
  • Right, but what I'm asking is why you need the "or more"... can't you do `.{21}\s` and exactly match the 21 characters immediately before the space? – jtr13 Oct 02 '19 at 11:28
  • 1
    @jtr13 You can use `.{21}\s` if it is your intention. OP had a different issue. – Wiktor Stribiżew Oct 02 '19 at 11:35