Is there a simple way to remove multiple spaces in a string?

Question

Suppose this string:

The   fox jumped   over    the log.

Turning into:

The fox jumped over the log.

What is the simplest (1-2 lines) to achieve this, without splitting and going into lists?

What is your aversion to lists? They are an integral part of the language, and " ".join(list_of_words) is one of the core idioms for making a list of strings into a single space-delimited string. — PaulMcG, Oct 09 '09 at 23:32
@Tom/@Paul: For simple strings, (string) join would be simple and sweet. But it gets more complex if there is other whitespace that one does NOT want to disturb... in which case "while" or regex solutions would be best. I've posted below a string-join that would be "correct", with timed test results for three ways of doing this. — pythonlarry, Apr 09 '13 at 21:49

score 881 · Accepted Answer · edited Jan 04 '19 at 14:59

881

>>> import re
>>> re.sub(' +', ' ', 'The     quick brown    fox')
'The quick brown fox'

edited Jan 04 '19 at 14:59

Francisco

10,918
6
34
45

answered Oct 09 '09 at 21:52

Josh Lee

171,072
38
269
275

41

This solution only handles single space characters. It wouldn't replace a tab or other whitespace characters handled by \s like in nsr81's solution. – Taylor Leese Oct 09 '09 at 22:21
3

That's true, `string.split` also handles all kinds of whitespaces. – Josh Lee Oct 10 '09 at 07:55
10

I prefer this one because it only focuses on the space character and doesn't affect characters like '\n's. – hhsaffar Oct 17 '14 at 20:13
This does not work for strings _beginning_ or _ending_ in at least one space. – gabchan May 29 '16 at 12:49
3

Yes right. But before that strip() should be done. It will remove spaces from both end. – Hardik Patel Dec 29 '16 at 12:46
41

You can use `re.sub(' {2,}', ' ', 'The quick brown fox')` to **prevent redundant replacements of single-space with single-space**. – AneesAhmed777 May 16 '18 at 13:51
what if a string has single space on ends? – Tahir Raza Apr 03 '19 at 10:53
This works, but is 1/6th the speed of the join/split solution below from @TaylorLeese – nerdfever.com Mar 20 '20 at 00:45
1

`\s` can be used to match other types of whitespace, or an array `[ \t]` if one of the chars should be ignored – ti7 Jul 15 '20 at 19:14
I found this solution helpful, especially for my Pandas DataFrame where I have a column of strings with extra spaces. Pandas string method accepts regular expression: `df.columnName.str.replace(' +', ' ')` where df is a Pandas DataFrame. (In Pandas, this replace method was faster than a split/join method `df.columnName.str.split().str.join(' ')`). – blaylockbk Sep 10 '20 at 20:05
6

For the lazy ones who can't browse through all answers. `re.sub('\s{2,}', ' ', text)` will replace all consecutive whitespace characters (tabs, newlines etc.) with a single space. – otaku Oct 29 '20 at 12:28
This fails in pypy-3.6 actually :( – Toilal Dec 23 '20 at 12:05
This doesn't work in python3 can anyone fix it? – Raymond Jan 01 '22 at 22:46
Why import a module when not necessary? – FifthAxiom Jun 11 '22 at 13:18
`' '.join(the_string.split(' '))` also keeps other whitespaces. (13 years later) – FifthAxiom Jun 11 '22 at 13:26

score 817 · Answer 2 · edited Feb 02 '20 at 12:04

817

foo is your string:

" ".join(foo.split())

Be warned though this removes "all whitespace characters (space, tab, newline, return, formfeed)" (thanks to hhsaffar, see comments). I.e., "this is \t a test\n" will effectively end up as "this is a test".

edited Feb 02 '20 at 12:04

Peter Mortensen

30,738
21
105
131

answered Oct 09 '09 at 21:52

Taylor Leese

51,004
28
112
141

31

“Without splitting and going into lists...” – Gumbo Oct 09 '09 at 21:57
118

I ignored "Without splitting and going into lists..." because I still think it's the best answer. – Taylor Leese Oct 10 '09 at 03:44
2

This removes trailing spaces. If you want to keep them do: text[0:1] + " ".join(text[1:-1].split()) + text[-1] – user984003 Aug 12 '13 at 14:49
8

6x faster than the re.sub() solution, too. – nerdfever.com Mar 20 '20 at 00:45
1

@nerdfever.com how would you verify that it is 6x faster? – Astra Uvarova - Saturn's star Apr 21 '20 at 20:49
5

It is not obvious what this line of code does at first glance. Others will have hard time figuring out why would you split and back join the string. The regex answer is more explicit about what it does. – Jeyekomon Mar 04 '21 at 12:24
Use `"\n".join(" ".join(line.split()) for line in foo.splitlines())` if you want to preserve the newline character – humble_barnacle Jun 11 '21 at 08:17
2

@fonzane `str.split()` with no specified character to split on by default splits on any whitespace character. It returns a list of all words (groups of non-whitespace characters). `" ".join` then joins that list together with a single space between each word. – Jamie Bull Jul 19 '21 at 08:34
combine with an apply and use it on pandas – Rockbar Oct 29 '21 at 08:07
6

I'm not sure what the OP's reasons for wanting to avoid splitting and joining, but I can tell you mine: When working with my large dataframe, the split + rejoin uses so much memory it crashes. (I use PythonAnywhere and they kill the process.) The re.sub does not. – larapsodia Nov 13 '21 at 04:34
Sorry didn't work for me in python3 – Raymond Jan 01 '22 at 22:48
The `re`-module also uses `lists`. To only strip consecutive repeated spaces one can use `' '.join(the_string.split(' '))`. This will also remove the leading and trailing spaces but will keep newlines, tabs, etc. – FifthAxiom Jun 11 '22 at 13:37

score 136 · Answer 3 · edited Feb 02 '20 at 11:44

136

import re
s = "The   fox jumped   over    the log."
re.sub("\s\s+" , " ", s)

or

re.sub("\s\s+", " ", s)

since the space before comma is listed as a pet peeve in PEP 8, as mentioned by user Martin Thoma in the comments.

edited Feb 02 '20 at 11:44

Peter Mortensen

30,738
21
105
131

answered Oct 09 '09 at 21:52

Nasir

10,935
8
31
39

3

I'd tend to change that regex to `r"\s\s+"` so that it doesn't try to replace already-single spaces. – Ben Blank Oct 09 '09 at 21:55
28

If you wanted that behavior, why not just `"\s{2,}"` instead of a workaround for not knowing moderately-advanced regex behavior? – Chris Lutz Oct 09 '09 at 22:06
3

remember that sub() does not change the input string `s`, but return the new value. – gcb Aug 28 '13 at 06:49
1

@BenBlank Why is it bad if the regex replaces already-single spaces? Is that computationally more intensive? Why? What happens with each replacement operation? – Martin Thoma Jan 20 '15 at 12:21
1

@moose — It's a readability optimization than a performance one. `\s+` would cause the line to read "replace *one* or more spaces with a space", rather than "replace *two* or more spaces with a space". The former immediately makes me stop and think "Why replace one space with one space? That's silly." To me, that's a (very minor) code smell. I actually wouldn't expect there to be any performance difference at all between the two, as it's going to be copying into a new string anyway, and has to stop and test regardless of where the space is being copied *from*. – Ben Blank Jan 21 '15 at 21:20
2

@BenBlank: Ok, thank you. You also seem to want the space in `"\s\s+" , " "` before the `,` although it is against [PEP8](https://www.python.org/dev/peps/pep-0008/#pet-peeves). Could you please explain why? – Martin Thoma Jan 22 '15 at 06:50
@moose — I assume that's just a typo on the part of the original poster. I certainly wouldn't recommend it. :-) – Ben Blank Jan 23 '15 at 06:41
@BenBlank I've corrected this typo and he reverted it. So he seems to want it. – Martin Thoma Jan 23 '15 at 06:46
17

I'd advise against `\s\s+` because this won't normalize a TAB character back to a normal space. a SPACE + TAB does get replaced this way. – vdboor Jul 27 '15 at 10:35
4

I would also `strip()` (aka trim) the string before doing this as you probably do not want leading and trailing spaces. – Christophe Roussy Nov 10 '16 at 10:55
This seems to outperform the accepted answer here form Josh – radtek Feb 06 '18 at 19:52

score 65 · Answer 4 · edited Feb 02 '20 at 12:07

Using regexes with "\s" and doing simple string.split()'s will also remove other whitespace - like newlines, carriage returns, tabs. Unless this is desired, to only do multiple spaces, I present these examples.

I used 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum to get realistic time tests and used random-length extra spaces throughout:

original_string = ''.join(word + (' ' * random.randint(1, 10)) for word in lorem_ipsum.split(' '))

The one-liner will essentially do a strip of any leading/trailing spaces, and it preserves a leading/trailing space (but only ONE ;-).

# setup = '''

import re

def while_replace(string):
    while '  ' in string:
        string = string.replace('  ', ' ')

    return string

def re_replace(string):
    return re.sub(r' {2,}' , ' ', string)

def proper_join(string):
    split_string = string.split(' ')

    # To account for leading/trailing spaces that would simply be removed
    beg = ' ' if not split_string[ 0] else ''
    end = ' ' if not split_string[-1] else ''

    # versus simply ' '.join(item for item in string.split(' ') if item)
    return beg + ' '.join(item for item in split_string if item) + end

original_string = """Lorem    ipsum        ... no, really, it kept going...          malesuada enim feugiat.         Integer imperdiet    erat."""

assert while_replace(original_string) == re_replace(original_string) == proper_join(original_string)

#'''

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string

# re_replace_test
new_string = original_string[:]

new_string = re_replace(new_string)

assert new_string != original_string

# proper_join_test
new_string = original_string[:]

new_string = proper_join(new_string)

assert new_string != original_string

NOTE: The "while version" made a copy of the original_string, as I believe once modified on the first run, successive runs would be faster (if only by a bit). As this adds time, I added this string copy to the other two so that the times showed the difference only in the logic. Keep in mind that the main stmt on timeit instances will only be executed once; the original way I did this, the while loop worked on the same label, original_string, thus the second run, there would be nothing to do. The way it's set up now, calling a function, using two different labels, that isn't a problem. I've added assert statements to all the workers to verify we change something every iteration (for those who may be dubious). E.g., change to this and it breaks:

# while_replace_test
new_string = original_string[:]

new_string = while_replace(new_string)

assert new_string != original_string # will break the 2nd iteration

while '  ' in original_string:
    original_string = original_string.replace('  ', ' ')

Tests run on a laptop with an i5 processor running Windows 7 (64-bit).

timeit.Timer(stmt = test, setup = setup).repeat(7, 1000)

test_string = 'The   fox jumped   over\n\t    the log.' # trivial

Python 2.7.3, 32-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001066 |   0.001260 |   0.001128 |   0.001092
     re_replace_test |   0.003074 |   0.003941 |   0.003357 |   0.003349
    proper_join_test |   0.002783 |   0.004829 |   0.003554 |   0.003035

Python 2.7.3, 64-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001025 |   0.001079 |   0.001052 |   0.001051
     re_replace_test |   0.003213 |   0.004512 |   0.003656 |   0.003504
    proper_join_test |   0.002760 |   0.006361 |   0.004626 |   0.004600

Python 3.2.3, 32-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001350 |   0.002302 |   0.001639 |   0.001357
     re_replace_test |   0.006797 |   0.008107 |   0.007319 |   0.007440
    proper_join_test |   0.002863 |   0.003356 |   0.003026 |   0.002975

Python 3.3.3, 64-bit, Windows
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.001444 |   0.001490 |   0.001460 |   0.001459
     re_replace_test |   0.011771 |   0.012598 |   0.012082 |   0.011910
    proper_join_test |   0.003741 |   0.005933 |   0.004341 |   0.004009

test_string = lorem_ipsum
# Thanks to http://www.lipsum.com/
# "Generated 11 paragraphs, 1000 words, 6665 bytes of Lorem Ipsum"

Python 2.7.3, 32-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.342602 |   0.387803 |   0.359319 |   0.356284
     re_replace_test |   0.337571 |   0.359821 |   0.348876 |   0.348006
    proper_join_test |   0.381654 |   0.395349 |   0.388304 |   0.388193    

Python 2.7.3, 64-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.227471 |   0.268340 |   0.240884 |   0.236776
     re_replace_test |   0.301516 |   0.325730 |   0.308626 |   0.307852
    proper_join_test |   0.358766 |   0.383736 |   0.370958 |   0.371866    

Python 3.2.3, 32-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.438480 |   0.463380 |   0.447953 |   0.446646
     re_replace_test |   0.463729 |   0.490947 |   0.472496 |   0.468778
    proper_join_test |   0.397022 |   0.427817 |   0.406612 |   0.402053    

Python 3.3.3, 64-bit
                test |      minum |    maximum |    average |     median
---------------------+------------+------------+------------+-----------
  while_replace_test |   0.284495 |   0.294025 |   0.288735 |   0.289153
     re_replace_test |   0.501351 |   0.525673 |   0.511347 |   0.508467
    proper_join_test |   0.422011 |   0.448736 |   0.436196 |   0.440318

For the trivial string, it would seem that a while-loop is the fastest, followed by the Pythonic string-split/join, and regex pulling up the rear.

For non-trivial strings, seems there's a bit more to consider. 32-bit 2.7? It's regex to the rescue! 2.7 64-bit? A while loop is best, by a decent margin. 32-bit 3.2, go with the "proper" join. 64-bit 3.3, go for a while loop. Again.

In the end, one can improve performance if/where/when needed, but it's always best to remember the mantra:

Make It Work
Make It Right
Make It Fast

IANAL, YMMV, Caveat Emptor!

I would have prefered if you had tested the simple `' '.join(the_string.split())` as this is the usual use case but I'd like to say thank you for your work! — wedi, Sep 29 '14 at 00:43
@wedi: Per other comments (like from [Gumbo](https://stackoverflow.com/questions/1546226/a-simple-way-to-remove-multiple-spaces-in-a-string-in-python/15913564#comment1403367_1546251); [user984003](https://stackoverflow.com/questions/1546226/a-simple-way-to-remove-multiple-spaces-in-a-string-in-python/15913564#comment26654990_1546883), though her/his solution is presumptive and won't work "in all cases"), this sort of solution doesn't adhere to the questioner's request. One may use .split(' '), and a comp/gen, but gets hairier to deal with lead/trailing spaces. — pythonlarry, Oct 26 '14 at 16:09
@wedi: E.g.: `' '.join(p for p in s.split(' ') if p)` <-- still lost lead/trailing spaces, but accounted for multiple spaces. To keep them, must do like `parts = s.split(' '); (' ' if not parts[0] else '') + ' '.join(p for p in s.split(' ') if p) + (' ' if not parts[-1] else '')`! — pythonlarry, Oct 26 '14 at 16:12
Thanks @pythonlarry for the mantra! and love the detailed test! I'm curious to know if your thoughts or views have changed on this since its been 6 years? — JayRizzo, May 15 '19 at 07:02
@Lee, could you provide an example of your thoughts on a generator version, please? — pythonlarry, Jul 18 '20 at 17:54

score 59 · Answer 5 · edited Feb 02 '20 at 12:05

59

I have to agree with Paul McGuire's comment. To me,

' '.join(the_string.split())

is vastly preferable to whipping out a regex.

My measurements (Linux and Python 2.5) show the split-then-join to be almost five times faster than doing the "re.sub(...)", and still three times faster if you precompile the regex once and do the operation multiple times. And it is by any measure easier to understand -- much more Pythonic.

edited Feb 02 '20 at 12:05

Peter Mortensen

30,738
21
105
131

answered Oct 10 '09 at 02:39

Kevin Little

12,436
5
39
47

This removes trailing spaces. If you want to keep them do: text[0:1] + " ".join(text[1:-1].split()) + text[-1] – user984003 Aug 12 '13 at 14:51
4

a simple regexp is much better to read. never optimize for performance before you need to. – gcb Aug 28 '13 at 06:46
@gcb: Why not? What if you're expecting a high throughput scenario (e.g. because of high demand)? Why not deploy something you expect to be less resource intensive from the get go in that scenario? – Hassan Baig Mar 03 '18 at 11:43
1

@HassanBaig if you already have the performance requirement, then it isn't really premature optimization, right? My point is for when you don't need to obsess about performance yet, it is always better to aim for readability. – gcb Mar 11 '18 at 19:47
2

@gcb 'Never optimize for performance before you need to', sounds like a revenue model for an IT business. Also I find a simple `regex` more difficult to read, so to me that's a matter of preference. `Split` and `join` are widely used and are available without importing extra modules. When it comes to software, one should be foresight and prepared for some scenarios. Most developers make this mistake and customers start to complain after a while. Software must be able to handle known unforeseen situations. Speed, stability, security and -> uniformity is far more important than readability. – FifthAxiom Jun 11 '22 at 13:13
To only strip consecutive repeated spaces one can use `' '.join(the_string.split(' '))`. This will also remove the leading and trailing spaces but will keep newlines, tabs, etc. – FifthAxiom Jun 11 '22 at 13:14

score 21 · Answer 6 · answered Oct 09 '09 at 21:58

21

Similar to the previous solutions, but more specific: replace two or more spaces with one:

>>> import re
>>> s = "The   fox jumped   over    the log."
>>> re.sub('\s{2,}', ' ', s)
'The fox jumped over the log.'

answered Oct 09 '09 at 21:58

Peter

127,331
53
180
211

2

Why are you repeating answer? – Raymond Jan 01 '22 at 22:51

score 19 · Answer 7 · edited Feb 02 '20 at 12:59

19

I have tried the following method and it even works with the extreme case like:

str1='          I   live    on    earth           '

' '.join(str1.split())

But if you prefer a regular expression it can be done as:

re.sub('\s+', ' ', str1)

Although some preprocessing has to be done in order to remove the trailing and ending space.

edited Feb 02 '20 at 12:59

Peter Mortensen

30,738
21
105
131

answered Oct 19 '18 at 05:27

ravi tanwar

598
5
16

Leading and trailing spaces could be easily removed by str1.strip() then pass it to your re.sub() as follows re.sub(' +', ' ', str1.strip()) – Youstanzr Apr 21 '22 at 08:54

score 18 · Answer 8 · edited Dec 15 '21 at 17:02

import re

Text = " You can select below trims for removing white space!!   BR Aliakbar     "
  # trims all white spaces
print('Remove all space:',re.sub(r"\s+", "", Text), sep='') 
# trims left space
print('Remove leading space:', re.sub(r"^\s+", "", Text), sep='') 
# trims right space
print('Remove trailing spaces:', re.sub(r"\s+$", "", Text), sep='')  
# trims both
print('Remove leading and trailing spaces:', re.sub(r"^\s+|\s+$", "", Text), sep='')
# replace more than one white space in the string with one white space
print('Remove more than one space:',re.sub(' +', ' ',Text), sep='')

Result: as code

"Remove all space:Youcanselectbelowtrimsforremovingwhitespace!!BRAliakbar"
"Remove leading space:You can select below trims for removing white space!!   BR Aliakbar"     
"Remove trailing spaces: You can select below trims for removing white space!!   BR Aliakbar"
"Remove leading and trailing spaces:You can select below trims for removing white space!!   BR Aliakbar"
"Remove more than one space: You can select below trims for removing white space!! BR Aliakbar"

score 16 · Answer 9 · answered Nov 04 '15 at 06:11

16

A simple soultion

>>> import re
>>> s="The   fox jumped   over    the log."
>>> print re.sub('\s+',' ', s)
The fox jumped over the log.

answered Nov 04 '15 at 06:11

Hafiz Muhammad Shafiq

8,168
12
63
121

score 11 · Answer 10 · answered Jun 19 '18 at 00:31

You can also use the string splitting technique in a Pandas DataFrame without needing to use .apply(..), which is useful if you need to perform the operation quickly on a large number of strings. Here it is on one line:

df['message'] = (df['message'].str.split()).str.join(' ')

score 11 · Answer 11 · edited Feb 02 '20 at 17:24

11

Solution for Python developers:

import re

text1 = 'Python      Exercises    Are   Challenging Exercises'
print("Original string: ", text1)
print("Without extra spaces: ", re.sub(' +', ' ', text1))

Output:
Original string: Python Exercises Are Challenging Exercises Without extra spaces: Python Exercises Are Challenging Exercises

edited Feb 02 '20 at 17:24

Peter Mortensen

30,738
21
105
131

answered Jan 05 '20 at 07:12

Chadee Fouad

2,630
2
23
29

2

this is using regex (google it) but basically ' +' means one or more spaces...so basically I'm replacing one or more spaces with a single space. – Chadee Fouad Mar 20 '21 at 02:51

score 9 · Answer 12 · edited Feb 02 '20 at 12:49

9

import re
string = re.sub('[ \t\n]+', ' ', 'The     quick brown                \n\n             \t        fox')

This will remove all the tabs, new lines and multiple white spaces with single white space.

edited Feb 02 '20 at 12:49

Peter Mortensen

30,738
21
105
131

answered Apr 18 '17 at 01:30

Rakesh Kumar

351
3
4

But if you have whitespace (non-printable) characters not in your range like '\x00' to '\x0020' the code will not strip them. – Muskovets Jan 18 '19 at 10:02

score 5 · Answer 13 · answered Jul 15 '22 at 20:47

5

This one does exactly what you want

old_string = 'The   fox jumped   over    the log '
new_string = " ".join(old_string.split())
print(new_string)

Will results to

The fox jumped over the log.

answered Jul 15 '22 at 20:47

my_name_is_njuno

155
2
3

score 4 · Answer 14 · edited Feb 02 '20 at 12:13

One line of code to remove all extra spaces before, after, and within a sentence:

sentence = "  The   fox jumped   over    the log.  "
sentence = ' '.join(filter(None,sentence.split(' ')))

Explanation:

Split the entire string into a list.
Filter empty elements from the list.
Rejoin the remaining elements* with a single space

*The remaining elements should be words or words with punctuations, etc. I did not test this extensively, but this should be a good starting point. All the best!

score 4 · Answer 15 · edited Feb 02 '20 at 12:55

4

The fastest you can get for user-generated strings is:

if '  ' in text:
    while '  ' in text:
        text = text.replace('  ', ' ')

The short circuiting makes it slightly faster than pythonlarry's comprehensive answer. Go for this if you're after efficiency and are strictly looking to weed out extra whitespaces of the single space variety.

edited Feb 02 '20 at 12:55

Peter Mortensen

30,738
21
105
131

answered Mar 03 '18 at 17:30

Hassan Baig

15,055
27
102
205

score 4 · Answer 16 · answered Jul 13 '20 at 19:14

" ".join(foo.split()) is not quite correct with respect to the question asked because it also entirely removes single leading and/or trailing white spaces. So, if they shall also be replaced by 1 blank, you should do something like the following:

" ".join(('*' + foo + '*').split()) [1:-1]

Of course, it's less elegant.

score 3 · Answer 17 · edited Feb 02 '20 at 11:50

3

Another alternative:

>>> import re
>>> str = 'this is a            string with    multiple spaces and    tabs'
>>> str = re.sub('[ \t]+' , ' ', str)
>>> print str
this is a string with multiple spaces and tabs

edited Feb 02 '20 at 11:50

Peter Mortensen

30,738
21
105
131

answered Jul 25 '12 at 10:19

Kreshnik

2,661
5
31
39

score 3 · Answer 18 · answered Aug 13 '17 at 17:53

In some cases it's desirable to replace consecutive occurrences of every whitespace character with a single instance of that character. You'd use a regular expression with backreferences to do that.

(\s)\1{1,} matches any whitespace character, followed by one or more occurrences of that character. Now, all you need to do is specify the first group (\1) as the replacement for the match.

Wrapping this in a function:

import re

def normalize_whitespace(string):
    return re.sub(r'(\s)\1{1,}', r'\1', string)

>>> normalize_whitespace('The   fox jumped   over    the log.')
'The fox jumped over the log.'
>>> normalize_whitespace('First    line\t\t\t \n\n\nSecond    line')
'First line\t \nSecond line'

rafal chlopek · Answer 19 · 2020-09-14T14:58:55.473

3

Quite surprising - no one posted simple function which will be much faster than ALL other posted solutions. Here it goes:

def compactSpaces(s):
    os = ""
    for c in s:
        if c != " " or (os and os[-1] != " "):
            os += c 
    return os

edited Sep 14 '20 at 14:58

answered Mar 04 '20 at 13:32

rafal chlopek

558
5
12

1

How is this faster?! you're using a loop to go across the entire string. If this is a super large string it might take a long time. Regex is faster. That's not to mention that you took 5 lines when regex does it in 1 line. I prefer avoiding loops whenever possible. – Chadee Fouad Dec 12 '21 at 10:04

score 3 · Answer 20 · answered Jul 22 '20 at 18:16

Because @pythonlarry asked here are the missing generator based versions

The groupby join is easy. Groupby will group elements consecutive with same key. And return pairs of keys and list of elements for each group. So when the key is an space an space is returne else the entire group.

from itertools import groupby
def group_join(string):
  return ''.join(' ' if chr==' ' else ''.join(times) for chr,times in groupby(string))

The group by variant is simple but very slow. So now for the generator variant. Here we consume an iterator, the string, and yield all chars except chars that follow an char.

def generator_join_generator(string):
  last=False
  for c in string:
    if c==' ':
      if not last:
        last=True
        yield ' '
    else:
      last=False
    yield c

def generator_join(string):
  return ''.join(generator_join_generator(string))

So i meassured the timings with some other lorem ipsum.

while_replace 0.015868543065153062
re_replace 0.22579886706080288
proper_join 0.40058281796518713
group_join 5.53206754301209
generator_join 1.6673167790286243

With Hello and World separated by 64KB of spaces

while_replace 2.991308711003512
re_replace 0.08232860406860709
proper_join 6.294375243945979
group_join 2.4320066600339487
generator_join 6.329648651066236

Not forget the original sentence

while_replace 0.002160938922315836
re_replace 0.008620491018518806
proper_join 0.005650000995956361
group_join 0.028368217987008393
generator_join 0.009435956948436797

Interesting here for nearly space only strings group join is not that worse Timing showing always median from seven runs of a thousand times each.

score 3 · Answer 21 · answered Feb 22 '23 at 23:27

3

This regex will do the trick in Python 3.11:

re.sub(r'\s+', ' ', text)

The accepted answer of this thread did not work for me in Python 3.11 on Mac:

re.sub(' +', ' ', 'The     quick brown    fox') # does not work for me

answered Feb 22 '23 at 23:27

mwx

91
1
4

score 1 · Answer 22 · edited Feb 02 '20 at 12:47

1

def unPretty(S):
   # Given a dictionary, JSON, list, float, int, or even a string...
   # return a string stripped of CR, LF replaced by space, with multiple spaces reduced to one.
   return ' '.join(str(S).replace('\n', ' ').replace('\r', '').split())

edited Feb 02 '20 at 12:47

Peter Mortensen

30,738
21
105
131

answered Dec 15 '16 at 15:22

jw51

11
1

score 0 · Answer 23 · edited Feb 02 '20 at 12:11

0

string = 'This is a             string full of spaces          and taps'
string = string.split(' ')
while '' in string:
    string.remove('')
string = ' '.join(string)
print(string)

Results:

This is a string full of spaces and taps

edited Feb 02 '20 at 12:11

Peter Mortensen

30,738
21
105
131

answered Feb 14 '16 at 11:58

Hassan Abdul-Kareem

101
1
11

score 0 · Answer 24 · edited Feb 02 '20 at 12:43

To remove white space, considering leading, trailing and extra white space in between words, use:

(?<=\s) +|^ +(?=\s)| (?= +[\n\0])

The first or deals with leading white space, the second or deals with start of string leading white space, and the last one deals with trailing white space.

For proof of use, this link will provide you with a test.

https://regex101.com/r/meBYli/4

This is to be used with the re.split function.

score 0 · Answer 25 · edited Feb 02 '20 at 12:54

I haven't read a lot into the other examples, but I have just created this method for consolidating multiple consecutive space characters.

It does not use any libraries, and whilst it is relatively long in terms of script length, it is not a complex implementation:

def spaceMatcher(command):
    """
    Function defined to consolidate multiple whitespace characters in
    strings to a single space
    """
    # Initiate index to flag if more than one consecutive character
    iteration
    space_match = 0
    space_char = ""
    for char in command:
      if char == " ":
          space_match += 1
          space_char += " "
      elif (char != " ") & (space_match > 1):
          new_command = command.replace(space_char, " ")
          space_match = 0
          space_char = ""
      elif char != " ":
          space_match = 0
          space_char = ""
   return new_command

command = None
command = str(input("Please enter a command ->"))
print(spaceMatcher(command))
print(list(spaceMatcher(command)))

score 0 · Answer 26 · edited Nov 10 '20 at 07:29

0

This does and will do: :)

# python... 3.x
import operator
...
# line: line of text
return " ".join(filter(lambda a: operator.is_not(a, ""), line.strip().split(" ")))

edited Nov 10 '20 at 07:29

Gino Mempin

25,369
29
96
135

answered Nov 10 '20 at 01:30

prabeennet

11
1
1

score -1 · Answer 27 · answered Nov 24 '22 at 06:21

-1

Easiest solution ever!

a = 'The   fox jumped   over    the log.'
while '  ' in a: a = a.replace('  ', ' ')
print(a)

Output:

The fox jumped over the log.

answered Nov 24 '22 at 06:21

Chadee Fouad

2,630
2
23
29

The challenge is to tackle multiple spaces. The number of these spaces are not fixed. Hence a fixed length replace approach is ineffective. – Farhan Hai Khan Jul 03 '23 at 09:18
@FarhanHaiKhan Please read the code well. This is NOT a fixed length replace approach so your downvote is on wrong and unfair basis. – Chadee Fouad Jul 13 '23 at 19:09
Please note that I called it ineffective and not incorrect. The number of this iterations this can run for a big string is insane, and that's why it's been downvoted. In other words "Can you do better than that?" – Farhan Hai Khan Jul 14 '23 at 08:30
@FarhanHaiKhan please read the requirements again. He says "What is the SIMPLEST (1-2 lines) to achieve this". He's asking for simplicity not speed. This is literally 1 line solution and it meets the requirements. Once again your downvote is on wrong and unfair basis. – Chadee Fouad Jul 15 '23 at 20:48
There's a one liner for this using re.sub, simple and effective. Are you implying this is "simpler" in any way? – Farhan Hai Khan Jul 18 '23 at 08:17
@FarhanHaiKhan Yes, my solution is simpler and shorter. First, mine is a one liner and Second, re.sub is 2 liner because you have to import a library and it's difficult to understand for beginners. Your down vote is unfair and is pure arrogance given that you didn't read the requirements of the question and just making new requirements that he didn't ask for. This is counter productive and not encouraging people who are actually trying to help beginners. – Chadee Fouad Jul 24 '23 at 02:29

Is there a simple way to remove multiple spaces in a string?

27 Answers27

Linked

Related