How to remove extra white space between words inside a character vector using?

Question

Suppose I have a character vector like

"Hi,  this is a   good  time to   start working   together.".

I just want to have

" Hi, this is a good time to start working together."

Only one white space between two words. How should I do this in R?

thelatemail · Accepted Answer · 2013-10-02T01:34:32.293

58

gsub is your friend:

test <- "Hi,  this is a   good  time to   start working   together."
gsub("\\s+"," ",test)
#[1] "Hi, this is a good time to start working together."

\\s+ will match any space character (space, tab etc), or repeats of space characters, and will replace it with a single space " ".

edited Oct 02 '13 at 01:34

answered Oct 02 '13 at 00:56

thelatemail

91,185
12
128
188

I was using `"\\s"` in the second argument instead of `" "`, thanks! – MichaelChirico Jul 12 '15 at 21:49
But using `"\\s"` in the second argument `" "` positing removes all the spaces and bring all the letters together. – bim Mar 08 '16 at 18:26
@thelatemail Can you suggest a faster code than this? – AMS Jul 20 '20 at 18:12

Koot6133 · Answer 2 · 2019-12-18T13:00:17.050

29

Another option is the squish function from the stringr library

library(stringr)
string <- "Hi,  this is a   good  time to   start working   together."
str_squish(string)
#[1] ""Hi, this is a good time to start working together.""

edited Dec 18 '19 at 13:00

answered Oct 01 '19 at 13:36

Koot6133

1,428
15
26

2

This is easier than other methods. – Suat Atan PhD Mar 25 '20 at 19:30

score 3 · Answer 3 · answered Mar 06 '21 at 12:25

Since the title of the question is "remove the extra whitespace between words", without touching the leading and trailing whitespaces, the answer is (assuming the "words" are non-whitespace character chunks)

gsub("(\\S)\\s{2,}(?=\\S)", "\\1 ", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)\\s{2,}(?=\\S)", "\\1 ")
## Or, if the whitespace to leep is  the last whitespace in those matched
gsub("(\\S)(\\s){2,}(?=\\S)", "\\1\\2", text, perl=TRUE)
stringr::str_replace_all(text, "(\\S)(\\s){2,}(?=\\S)", "\\1\\2")

See regex demo #1 and regex demo #2 and this R demo.

Regex details:

(\S) - Capturing group 1 (\1 refers to this group value from the replacement pattern): a non-whitespace char
\s{2,} - two or more whitespace chars (in Regex #2, it is wrapped with parentheses to form a capturing group with ID 2 (\2))
(?=\S) - a positive lookahead that requires a non-whitespace char immediately to the right of the current location.

score 1 · Answer 4 · answered Jul 12 '22 at 17:04

The package textclean has many useful tools for processing text. replace_white would be useful here:

v <- "Hi,  this is a   good  time to   start working   together."

textclean::replace_white(v)
# [1] "Hi, this is a good time to start working together."

How to remove extra white space between words inside a character vector using?

4 Answers4

Linked