39

Suppose I have a character vector like

"Hi,  this is a   good  time to   start working   together.". 

I just want to have

" Hi, this is a good time to start working together." 

Only one white space between two words. How should I do this in R?

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
Smith Black
  • 515
  • 1
  • 5
  • 10

4 Answers4

58

gsub is your friend:

test <- "Hi,  this is a   good  time to   start working   together."
gsub("\\s+"," ",test)
#[1] "Hi, this is a good time to start working together."

\\s+ will match any space character (space, tab etc), or repeats of space characters, and will replace it with a single space " ".

thelatemail
  • 91,185
  • 12
  • 128
  • 188
29

Another option is the squish function from the stringr library

library(stringr)
string <- "Hi,  this is a   good  time to   start working   together."
str_squish(string)
#[1] ""Hi, this is a good time to start working together.""
Koot6133
  • 1,428
  • 15
  • 26
3

Since the title of the question is "remove the extra whitespace between words", without touching the leading and trailing whitespaces, the answer is (assuming the "words" are non-whitespace character chunks)

gsub("(\\S)\\s{2,}(?=\\S)", "\\1 ", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)\\s{2,}(?=\\S)", "\\1 ")
## Or, if the whitespace to leep is  the last whitespace in those matched
gsub("(\\S)(\\s){2,}(?=\\S)", "\\1\\2", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)(\\s){2,}(?=\\S)", "\\1\\2")

See regex demo #1 and regex demo #2 and this R demo.

Regex details:

  • (\S) - Capturing group 1 (\1 refers to this group value from the replacement pattern): a non-whitespace char
  • \s{2,} - two or more whitespace chars (in Regex #2, it is wrapped with parentheses to form a capturing group with ID 2 (\2))
  • (?=\S) - a positive lookahead that requires a non-whitespace char immediately to the right of the current location.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

The package textclean has many useful tools for processing text. replace_white would be useful here:

v <- "Hi,  this is a   good  time to   start working   together."

textclean::replace_white(v)
# [1] "Hi, this is a good time to start working together."
LMc
  • 12,577
  • 3
  • 31
  • 43