91

I want to merge multiple spaces into single space(space could be tab also) and remove trailing/leading spaces.

For example...

string <- "Hi        buddy        what's up    Bro" 

to

"Hi buddy what's up bro"

I checked the solution given at Regex to replace multiple spaces with a single space. Note that don't put \t or \n as exact space inside the toy string and feed that as pattern in gsub. I want that in R.

Note that I am unable to put multiple space in toy string. Thanks

Community
  • 1
  • 1
CKM
  • 1,911
  • 2
  • 23
  • 30
  • If you read my Q carefully at the end, you can create a toy string with multiple spaces and then reply to my Q. I said above that I'm unable to put multiple space in toy string as stackoverlfow removed that automatically from my query. – CKM Sep 07 '14 at 08:27
  • 11
    `gsub("^ *|(?<= ) | *$", "", x, perl = TRUE)` – David Arenburg Sep 07 '14 at 08:32
  • Hi David, That works for me. But can you explain me what the pattern is doing exactly. i.e ^ *|(?<= ) | *$ it says, replace everything with space " " but *|(?<=)|*$? ? is it correct? how it solves my problem. I want to know. – CKM Sep 07 '14 at 08:44
  • 2
    See [here](http://rick.measham.id.au/paste/explain.pl?regex=%5E+*%7C%28%3F%3C%3D+%29+%7C+*%24) – David Arenburg Sep 07 '14 at 08:45
  • I voted to reopen. This one is slightly more involved looking at leading, trailing, and multiple spaces. – Tyler Rinker Sep 08 '14 at 23:07
  • @TylerRinker, the code I provided works perfectly fine in this case, so how isn't this a duplicate? – David Arenburg Sep 09 '14 at 11:07
  • 1
    @DavidArenburg The answer you gave works but the guidelines for closing regard questions. That question (I believed; though could be mistaken) was different (I can't find it now) in that it wanted multiple spaces and leading. This asks for multiple spaces and leading/trailing. Again I may have missed something in that previous post, but I didn't believe the 2 questions to be exact duplicates. – Tyler Rinker Sep 09 '14 at 12:08
  • @TylerRinker, even if this is not exact dupe, the answer there solves this question, but whatever – David Arenburg Sep 09 '14 at 12:17

9 Answers9

80

This seems to meet your needs.

string <- "  Hi buddy   what's up   Bro "
library(stringr)
str_replace(gsub("\\s+", " ", str_trim(string)), "B", "b")
# [1] "Hi buddy what's up bro"
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • Thank you for your reply.In fact i wanted only gsub part as I din't mean to replace B to b. Where I stuck was to find pattern for doing such thing. Could you please explain me meaning of \\s+ ? – CKM Sep 07 '14 at 08:33
  • 7
    @chandresh - `\\s+` means "one or more spaces" – Rich Scriven Sep 11 '14 at 17:12
  • 2
    Worth noting at this point that this is the only answer that addresses changing the upper case b in `Bro` to lower case, as is shown in the desired result of the question. – Rich Scriven May 02 '17 at 12:06
  • @RichScriven I don't need to lower case, how do I keep the case? – Herman Toothrot Jun 07 '17 at 17:33
71

Or simply try the squish function from stringr

library(stringr)
string <- "  Hi buddy   what's up   Bro "
str_squish(string)
# [1] "Hi buddy what's up Bro"
Henrik
  • 1,101
  • 9
  • 7
43

Another approach using a single regex:

gsub("(?<=[\\s])\\s*|^\\s+|\\s+$", "", string, perl=TRUE)

Explanation (from)

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  (?<=                     look behind to see if there is:
--------------------------------------------------------------------------------
    [\s]                     any character of: whitespace (\n, \r,
                             \t, \f, and " ")
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
33

You do not need to import external libraries to perform such a task:

string <- " Hi        buddy        what's up    Bro "
string <- gsub("\\s+", " ", string)
string <- trimws(string)
string
[1] "Hi buddy what's up Bro"

Or, in one line:

string <- trimws(gsub("\\s+", " ", string))

Much cleaner.

Adam Erickson
  • 6,027
  • 2
  • 46
  • 33
  • 4
    This doesn't depend on any external libraries, and also it's not a nightmarish REGEX like Tyler Rinker's. Wonder why don't you have more upvotes? – Rodrigo Sep 13 '20 at 00:34
  • 1
    I also wonder why @heisenbug47 exactly duplicated my answer half a year later. – Adam Erickson Dec 15 '21 at 15:23
6

The qdapRegex has the rm_white function to handle this:

library(qdapRegex)
rm_white(string)

## [1] "Hi buddy what's up Bro"
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
4

You could also try clean from qdap

library(qdap)
library(stringr)
str_trim(clean(string))
#[1] "Hi buddy what's up Bro"

Or as suggested by @Tyler Rinker (using only qdap)

Trim(clean(string))
#[1] "Hi buddy what's up Bro"
akrun
  • 874,273
  • 37
  • 540
  • 662
1

For this purpose no need to load any extra libraries as the gsub() of Base r package does the work.
No need to remember those extra libraries. Remove leading and trailing white spaces with trimws() and replace the extra white spaces using gsub() as mentioned by @Adam Erickson.

    `string = " Hi        buddy        what's up    Bro "
     trimws(gsub("\\s+", " ", string))`

Here \\s+ matches one or more white spaces and gsub replaces it with single space.

To know what any regular expression is doing, do visit this link as mentioned by @Tyler Rinker.
Just copy and paste the regular expression you want to know what it is doing and this will do the rest.

heisenbug47
  • 176
  • 1
  • 12
0

Another solution using strsplit:

Splitting text into words, and, then, concatenating single words using paste function.

string <- "Hi        buddy        what's up    Bro" 
stringsplit <- sapply(strsplit(string, " "), function(x){x[!x ==""]})
paste(stringsplit ,collapse = " ")

For more than one document:

string <- c("Hi        buddy        what's up    Bro"," an  example using       strsplit ") 
stringsplit <- lapply(strsplit(string, " "), function(x){x[!x ==""]})
sapply(stringsplit ,function(d) paste(d,collapse = " "))

enter image description here

Sam S.
  • 627
  • 1
  • 7
  • 23
0

This seems to work.
It doesn't eliminate whitespaces at the beginning or the end of the sentence as Rich Scriven's answer but, it merge multiple whitespices

library("stringr")
string <- "Hi     buddy     what's      up       Bro"
str_replace_all(string, "\\s+", " ")
#> str_replace_all(string, "\\s+", " ")
#  "Hi buddy what's up Bro"
alejandro00
  • 144
  • 6