Merge Multiple spaces to single space; remove trailing/leading spaces

Question

I want to merge multiple spaces into single space(space could be tab also) and remove trailing/leading spaces.

For example...

string <- "Hi        buddy        what's up    Bro"

to

"Hi buddy what's up bro"

I checked the solution given at Regex to replace multiple spaces with a single space. Note that don't put \t or \n as exact space inside the toy string and feed that as pattern in gsub. I want that in R.

Note that I am unable to put multiple space in toy string. Thanks

If you read my Q carefully at the end, you can create a toy string with multiple spaces and then reply to my Q. I said above that I'm unable to put multiple space in toy string as stackoverlfow removed that automatically from my query. — CKM, Sep 07 '14 at 08:27
Hi David, That works for me. But can you explain me what the pattern is doing exactly. i.e ^ *|(?<= ) | *$ it says, replace everything with space " " but *|(?<=)|*$? ? is it correct? how it solves my problem. I want to know. — CKM, Sep 07 '14 at 08:44
See [here](http://rick.measham.id.au/paste/explain.pl?regex=%5E+*%7C%28%3F%3C%3D+%29+%7C+*%24) — David Arenburg, Sep 07 '14 at 08:45
I voted to reopen. This one is slightly more involved looking at leading, trailing, and multiple spaces. — Tyler Rinker, Sep 08 '14 at 23:07
@TylerRinker, the code I provided works perfectly fine in this case, so how isn't this a duplicate? — David Arenburg, Sep 09 '14 at 11:07
@DavidArenburg The answer you gave works but the guidelines for closing regard questions. That question (I believed; though could be mistaken) was different (I can't find it now) in that it wanted multiple spaces and leading. This asks for multiple spaces and leading/trailing. Again I may have missed something in that previous post, but I didn't believe the 2 questions to be exact duplicates. — Tyler Rinker, Sep 09 '14 at 12:08
@TylerRinker, even if this is not exact dupe, the answer there solves this question, but whatever — David Arenburg, Sep 09 '14 at 12:17

Rich Scriven · Accepted Answer · 2015-11-21T18:18:17.407

80

This seems to meet your needs.

string <- "  Hi buddy   what's up   Bro "
library(stringr)
str_replace(gsub("\\s+", " ", str_trim(string)), "B", "b")
# [1] "Hi buddy what's up bro"

edited Nov 21 '15 at 18:18

answered Sep 07 '14 at 07:55

Rich Scriven

97,041
11
181
245

Thank you for your reply.In fact i wanted only gsub part as I din't mean to replace B to b. Where I stuck was to find pattern for doing such thing. Could you please explain me meaning of \\s+ ? – CKM Sep 07 '14 at 08:33
7

@chandresh - `\\s+` means "one or more spaces" – Rich Scriven Sep 11 '14 at 17:12
2

Worth noting at this point that this is the only answer that addresses changing the upper case b in `Bro` to lower case, as is shown in the desired result of the question. – Rich Scriven May 02 '17 at 12:06
@RichScriven I don't need to lower case, how do I keep the case? – Herman Toothrot Jun 07 '17 at 17:33

Henrik · Answer 2 · 2019-09-20T13:10:26.567

71

Or simply try the squish function from stringr

library(stringr)
string <- "  Hi buddy   what's up   Bro "
str_squish(string)
# [1] "Hi buddy what's up Bro"

edited Sep 20 '19 at 13:10

answered Apr 26 '18 at 08:33

Henrik

1,101
9
7

A modern approach, as str_squish came out in 2018, and is hidden under str_trim – Ben Allen Mar 02 '23 at 21:42

score 43 · Answer 3 · edited Dec 25 '15 at 02:31

Another approach using a single regex:

gsub("(?<=[\\s])\\s*|^\\s+|\\s+$", "", string, perl=TRUE)

Explanation (from)

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    [\s]                     any character of: whitespace (\n, \r,
                             \t, \f, and " ")
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

Why not use a much simpler REGEX, like Adam Erickson's? – Rodrigo Sep 13 '20 at 00:35 — Rodrigo, Sep 13 '20 at 00:35

score 33 · Answer 4 · answered Aug 14 '18 at 21:56

33

You do not need to import external libraries to perform such a task:

string <- " Hi        buddy        what's up    Bro "
string <- gsub("\\s+", " ", string)
string <- trimws(string)
string
[1] "Hi buddy what's up Bro"

Or, in one line:

string <- trimws(gsub("\\s+", " ", string))

Much cleaner.

answered Aug 14 '18 at 21:56

Adam Erickson

6,027
2
46
33

4

This doesn't depend on any external libraries, and also it's not a nightmarish REGEX like Tyler Rinker's. Wonder why don't you have more upvotes? – Rodrigo Sep 13 '20 at 00:34
1

I also wonder why @heisenbug47 exactly duplicated my answer half a year later. – Adam Erickson Dec 15 '21 at 15:23

score 6 · Answer 5 · answered Sep 29 '14 at 02:23

6

The qdapRegex has the rm_white function to handle this:

library(qdapRegex)
rm_white(string)

## [1] "Hi buddy what's up Bro"

answered Sep 29 '14 at 02:23

Tyler Rinker

108,132
65
322
519

akrun · Answer 6 · 2014-09-09T03:07:49.157

4

You could also try clean from qdap

library(qdap)
library(stringr)
str_trim(clean(string))
#[1] "Hi buddy what's up Bro"

Or as suggested by @Tyler Rinker (using only qdap)

Trim(clean(string))
#[1] "Hi buddy what's up Bro"

edited Sep 09 '14 at 03:07

answered Sep 07 '14 at 09:07

akrun

874,273
37
540
662

2

You could do all from within `qdap` via `Trim(clean(string))`. – Tyler Rinker Sep 08 '14 at 23:01

score 1 · Answer 7 · answered Jan 08 '19 at 10:16

For this purpose no need to load any extra libraries as the gsub() of Base r package does the work.
No need to remember those extra libraries. Remove leading and trailing white spaces with trimws() and replace the extra white spaces using gsub() as mentioned by @Adam Erickson.

    `string = " Hi        buddy        what's up    Bro "
     trimws(gsub("\\s+", " ", string))`

Here \\s+ matches one or more white spaces and gsub replaces it with single space.

To know what any regular expression is doing, do visit this link as mentioned by @Tyler Rinker.
Just copy and paste the regular expression you want to know what it is doing and this will do the rest.

Sam S. · Answer 8 · 2019-03-07T03:30:20.473

Another solution using strsplit:

Splitting text into words, and, then, concatenating single words using paste function.

string <- "Hi        buddy        what's up    Bro" 
stringsplit <- sapply(strsplit(string, " "), function(x){x[!x ==""]})
paste(stringsplit ,collapse = " ")

For more than one document:

string <- c("Hi        buddy        what's up    Bro"," an  example using       strsplit ") 
stringsplit <- lapply(strsplit(string, " "), function(x){x[!x ==""]})
sapply(stringsplit ,function(d) paste(d,collapse = " "))

score 0 · Answer 9 · answered Jul 22 '21 at 21:35

This seems to work.
It doesn't eliminate whitespaces at the beginning or the end of the sentence as Rich Scriven's answer but, it merge multiple whitespices

library("stringr")
string <- "Hi     buddy     what's      up       Bro"
str_replace_all(string, "\\s+", " ")
#> str_replace_all(string, "\\s+", " ")
#  "Hi buddy what's up Bro"

Merge Multiple spaces to single space; remove trailing/leading spaces

9 Answers9

Linked

Related