How do I trim whitespace?

Question

Is there a Python function that will trim whitespace (spaces and tabs) from a string?

So that given input " \t example string\t " becomes "example string".

Thanks for the heads up. I'd discovered the strip function earlier, but it doesn't seem to be working for my input.. — Chris, Jul 26 '09 at 21:00
Same as: http://stackoverflow.com/questions/761804/trimming-a-string-in-python (even though this question is slightly clearer, IMHO). This is also almost the same: http://stackoverflow.com/questions/959215/removing-starting-spaces-in-python — Jonik, Jul 26 '09 at 21:17
The characters python considers whitespace are stored in `string.whitespace`. — John Fouhy, Jul 26 '09 at 22:09
By "strip function" do you mean strip method? " it doesn't seem to be working for my input" Please provide your code, your input and the output. — S.Lott, Jul 27 '09 at 00:07
For everything? How about equals ignore case? That is an unfortunate case where it is much easier in nearly every other language. — demongolem, Jul 13 '11 at 18:29
http://stackoverflow.com/questions/3739909/how-to-strip-all-whitespace-from-string — vipin, Apr 17 '15 at 13:17
Possible duplicate of [Trimming a string in Python](https://stackoverflow.com/questions/761804/trimming-a-string-in-python) — Breno Baiardi, Jun 19 '17 at 16:24

score 1754 · Accepted Answer · edited Jun 22 '22 at 04:15

1754

For whitespace on both sides, use str.strip:

s = "  \t a string example\t  "
s = s.strip()

For whitespace on the right side, use str.rstrip:

s = s.rstrip()

For whitespace on the left side, use str.lstrip:

s = s.lstrip()

You can provide an argument to strip arbitrary characters to any of these functions, like this:

s = s.strip(' \t\n\r')

This will strip any space, \t, \n, or \r characters from both sides of the string.

The examples above only remove strings from the left-hand and right-hand sides of strings. If you want to also remove characters from the middle of a string, try re.sub:

import re
print(re.sub('[\s+]', '', s))

That should print out:

astringexample

edited Jun 22 '22 at 04:15

Boris Verkhovskiy

14,854
11
100
103

answered Jul 26 '09 at 20:56

James Thompson

46,512
18
65
82

4

Results for the examples should be quite helpful :) – ton Mar 12 '14 at 08:01
5

No need to list the whitespace characters: http://docs.python.org/2/library/string.html#string.whitespace – jesuis Mar 24 '14 at 13:56
As pointed by `mh`, you don't need to specify ' \t\n\r', as those will be stripped by default. – Jay Taylor Nov 14 '14 at 20:27
1

None of the above seem to strip all white spaces in some cases. I still have tones of tabs in the middle of a string. – imrek Sep 17 '15 at 13:55
@Drunken Master - Sorry for not being more clear, the examples above are only for left-hand, right-hand and both sides of a string. I've added another example that shows how to remove whitespace from the middle of a string. – James Thompson Sep 19 '15 at 18:44
The regex in the final example should be `[\s]+`. Tagging the string as a raw in a regex is something I typically do, so `print re.sub(r'[\s]+','',s)`. More about raw strings and escaping here: http://stackoverflow.com/questions/2241600/python-regex-r-prefix – Smoke Liberator Aug 18 '16 at 05:28
3

The last example is exactly as using `str.replace(" ","")`. You don't need to use `re`, unless you have more than one space, then your example doesn't work. `[]` is designed to mark single characters, it's unnecessary if you're using just `\s`. Use either `\s+` or `[\s]+` (unnecessary) but `[\s+]` doesn't do the job, in particular if you want to replace the multiple spaces with a single one like turning `"this example"` into `"this example"`. – Jorge E. Cardona Aug 18 '16 at 17:54
4

@JorgeE.Cardona - One thing you're slightly wrong about - `\s` will include tabs while `replace(" ", "")` won't. – ArtOfWarfare Mar 30 '17 at 17:54
One of those fun things about coming from JS to Python is trying to remember differences like this! Strip, not trim. Strip, not trim. – Matt Fletcher Dec 14 '17 at 10:43

score 82 · Answer 2 · edited May 21 '22 at 09:09

82

In Python trim methods are named strip:

str.strip()  # trim
str.lstrip()  # left trim
str.rstrip()  # right trim

edited May 21 '22 at 09:09

hc_dev

8,389
1
26
38

answered Feb 17 '12 at 10:00

gcb

13,901
7
67
92

5

which is easy to remember because s**tri**p looks almost like **tri**m. – isar Apr 02 '18 at 14:35

score 24 · Answer 3 · edited Oct 23 '17 at 11:58

24

For leading and trailing whitespace:

s = '   foo    \t   '
print s.strip() # prints "foo"

Otherwise, a regular expression works:

import re
pat = re.compile(r'\s+')
s = '  \t  foo   \t   bar \t  '
print pat.sub('', s) # prints "foobar"

edited Oct 23 '17 at 11:58

blues

4,547
3
23
39

answered Jul 26 '09 at 20:56

ars

120,335
23
147
134

1

You didn't compile your regex. You need to make it be `pat = re.compile(r'\s+')` – Evan Fosmark Jul 26 '09 at 21:02
You generally want to `sub(" ", s)` not `""` the later will merge the words and you'll no longer be able to use `.split(" ")` to tokenize. – user3467349 Feb 13 '15 at 19:20
it would be nice to see the output of the `print` statements – Ron Klein Jun 09 '16 at 14:43

score 24 · Answer 4 · answered Jun 11 '14 at 14:18

24

You can also use very simple, and basic function: str.replace(), works with the whitespaces and tabs:

>>> whitespaces = "   abcd ef gh ijkl       "
>>> tabs = "        abcde       fgh        ijkl"

>>> print whitespaces.replace(" ", "")
abcdefghijkl
>>> print tabs.replace(" ", "")
abcdefghijkl

Simple and easy.

answered Jun 11 '14 at 14:18

Lucas

3,517
13
46
75

2

But this, alas, also removes interior space, while the example in the original question leaves interior spaces untouched. – Brandon Rhodes Jan 19 '18 at 17:41

score 12 · Answer 5 · answered Feb 13 '12 at 05:16

#how to trim a multi line string or a file

s=""" line one
\tline two\t
line three """

#line1 starts with a space, #2 starts and ends with a tab, #3 ends with a space.

s1=s.splitlines()
print s1
[' line one', '\tline two\t', 'line three ']

print [i.strip() for i in s1]
['line one', 'line two', 'line three']




#more details:

#we could also have used a forloop from the begining:
for line in s.splitlines():
    line=line.strip()
    process(line)

#we could also be reading a file line by line.. e.g. my_file=open(filename), or with open(filename) as myfile:
for line in my_file:
    line=line.strip()
    process(line)

#moot point: note splitlines() removed the newline characters, we can keep them by passing True:
#although split() will then remove them anyway..
s2=s.splitlines(True)
print s2
[' line one\n', '\tline two\t\n', 'line three ']

score 4 · Answer 6 · answered Feb 12 '13 at 02:22

No one has posted these regex solutions yet.

Matching:

>>> import re
>>> p=re.compile('\\s*(.*\\S)?\\s*')

>>> m=p.match('  \t blah ')
>>> m.group(1)
'blah'

>>> m=p.match('  \tbl ah  \t ')
>>> m.group(1)
'bl ah'

>>> m=p.match('  \t  ')
>>> print m.group(1)
None

Searching (you have to handle the "only spaces" input case differently):

>>> p1=re.compile('\\S.*\\S')

>>> m=p1.search('  \tblah  \t ')
>>> m.group()
'blah'

>>> m=p1.search('  \tbl ah  \t ')
>>> m.group()
'bl ah'

>>> m=p1.search('  \t  ')
>>> m.group()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

If you use re.sub, you may remove inner whitespace, which could be undesirable.

score 4 · Answer 7 · answered Nov 28 '15 at 05:45

Whitespace includes space, tabs and CRLF. So an elegant and one-liner string function we can use is translate.

' hello apple'.translate(None, ' \n\t\r')

OR if you want to be thorough

import string
' hello  apple'.translate(None, string.whitespace)

score 3 · Answer 8 · answered Aug 08 '18 at 06:20

(re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()

This will remove all the unwanted spaces and newline characters. Hope this help

import re
my_str = '   a     b \n c   '
formatted_str = (re.sub(' +', ' ',(my_str.replace('\n',' ')))).strip()

This will result :

' a b \n c ' will be changed to 'a b c'

pbn · Answer 9 · 2019-11-14T14:01:19.750

2

    something = "\t  please_     \t remove_  all_    \n\n\n\nwhitespaces\n\t  "

    something = "".join(something.split())

output:

please_remove_all_whitespaces

Adding Le Droid's comment to the answer. To separate with a space:

    something = "\t  please     \t remove  all   extra \n\n\n\nwhitespaces\n\t  "
    something = " ".join(something.split())

output:

please remove all extra whitespaces

edited Nov 14 '19 at 14:01

answered Jun 19 '15 at 02:58

pbn

112
10

1

Simple and efficient. Could use " ".join(... to keep words separated with a space. – Le Droid Oct 14 '16 at 22:54

score 2 · Answer 10 · answered Apr 18 '20 at 16:47

Having looked at quite a few solutions here with various degrees of understanding, I wondered what to do if the string was comma separated...

the problem

While trying to process a csv of contact information, I needed a solution this problem: trim extraneous whitespace and some junk, but preserve trailing commas, and internal whitespace. Working with a field containing notes on the contacts, I wanted to remove the garbage, leaving the good stuff. Trimming out all the punctuation and chaff, I didn't want to lose the whitespace between compound tokens as I didn't want to rebuild later.

regex and patterns: `[\s_]+?\W+`

The pattern looks for single instances of any whitespace character and the underscore ('_') from 1 to an unlimited number of times lazily (as few characters as possible) with [\s_]+? that come before non-word characters occurring from 1 to an unlimited amount of time with this: \W+ (is equivalent to [^a-zA-Z0-9_]). Specifically, this finds swaths of whitespace: null characters (\0), tabs (\t), newlines (\n), feed-forward (\f), carriage returns (\r).

I see the advantage to this as two-fold:

that it doesn't remove whitespace between the complete words/tokens that you might want to keep together;
Python's built in string method strip()doesn't deal inside the string, just the left and right ends, and default arg is null characters (see below example: several newlines are in the text, and strip() does not remove them all while the regex pattern does). text.strip(' \n\t\r')

This goes beyond the OPs question, but I think there are plenty of cases where we might have odd, pathological instances within the text data, as I did (some how the escape characters ended up in some of the text). Moreover, in list-like strings, we don't want to eliminate the delimiter unless the delimiter separates two whitespace characters or some non-word character, like '-,' or '-, ,,,'.

NB: Not talking about the delimiter of the CSV itself. Only of instances within the CSV where the data is list-like, ie is a c.s. string of substrings.

Full disclosure: I've only been manipulating text for about a month, and regex only the last two weeks, so I'm sure there are some nuances I'm missing. That said, for smaller collections of strings (mine are in a dataframe of 12,000 rows and 40 odd columns), as a final step after a pass for removal of extraneous characters, this works exceptionally well, especially if you introduce some additional whitespace where you want to separate text joined by a non-word character, but don't want to add whitespace where there was none before.

An example:

import re


text = "\"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, , , , \r, , \0, ff dd \n invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, \n i69rpofhfsp9t7c practice 20ignition - 20june \t\n .2134.pdf 2109                                                 \n\n\n\nklkjsdf\""

print(f"Here is the text as formatted:\n{text}\n")
print()
print("Trimming both the whitespaces and the non-word characters that follow them.")
print()
trim_ws_punctn = re.compile(r'[\s_]+?\W+')
clean_text = trim_ws_punctn.sub(' ', text)
print(clean_text)
print()
print("what about 'strip()'?")
print(f"Here is the text, formatted as is:\n{text}\n")
clean_text = text.strip(' \n\t\r')  # strip out whitespace?
print()
print(f"Here is the text, formatted as is:\n{clean_text}\n")

print()
print("Are 'text' and 'clean_text' unchanged?")
print(clean_text == text)

This outputs:

Here is the text as formatted:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf" 

using regex to trim both the whitespaces and the non-word characters that follow them.

"portfolio, derp, hello-world, hello-, world, founders, mentors, ffib, biff, 1, 12.18.02, 12, 2013, 9874890288, ff, series a, exit, general mailing, fr, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk,  jim.somedude@blahblah.com, dd invites,subscribed,, master, dd invites,subscribed, ff dd invites, subscribed, alumni spring 2012 deck: https: www.dropbox.com s, i69rpofhfsp9t7c practice 20ignition 20june 2134.pdf 2109 klkjsdf"

Very nice.
What about 'strip()'?

Here is the text, formatted as is:

"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"


Here is the text, after stipping with 'strip':


"portfolio, derp, hello-world, hello-, -world, founders, mentors, :, ?, %, ,>, , ffib, biff, 1, 12.18.02, 12,  2013, 9874890288, .., ..., ...., , ff, series a, exit, general mailing, fr, , , ,, co founder, pitch_at_palace, ba, _slkdjfl_bf, sdf_jlk, )_(, jim.somedude@blahblah.com, ,dd invites,subscribed,, master, , , ,  dd invites,subscribed, ,, , , ff dd 
 invites, subscribed, , ,  , , alumni spring 2012 deck: https: www.dropbox.com s, 
 i69rpofhfsp9t7c practice 20ignition - 20june 
 .2134.pdf 2109                                                 



klkjsdf"
Are 'text' and 'clean_text' unchanged? 'True'

So strip removes one whitespace from at a time. So in the OPs case, strip() is fine. but if things get any more complex, regex and a similar pattern may be of some value for more general settings.

see it in action

score 1 · Answer 11 · edited Nov 07 '18 at 05:03

1

If using Python 3: In your print statement, finish with sep="". That will separate out all of the spaces.

EXAMPLE:

txt="potatoes"
print("I love ",txt,"",sep="")

This will print: I love potatoes.

Instead of: I love potatoes .

In your case, since you would be trying to get ride of the \t, do sep="\t"

edited Nov 07 '18 at 05:03

Lex

4,749
3
45
66

answered Nov 07 '18 at 04:20

morgansmnm

11
3

海洋顶端 · Answer 12 · 2015-04-15T03:49:24.197

0

try translate

>>> import string
>>> print '\t\r\n  hello \r\n world \t\r\n'

  hello 
 world  
>>> tr = string.maketrans(string.whitespace, ' '*len(string.whitespace))
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr)
'     hello    world    '
>>> '\t\r\n  hello \r\n world \t\r\n'.translate(tr).replace(' ', '')
'helloworld'

edited Apr 15 '15 at 03:49

answered Apr 15 '15 at 03:43

海洋顶端

9
2

score 0 · Answer 13 · answered Feb 26 '19 at 16:48

If you want to trim the whitespace off just the beginning and end of the string, you can do something like this:

some_string = "    Hello,    world!\n    "
new_string = some_string.strip()
# new_string is now "Hello,    world!"

This works a lot like Qt's QString::trimmed() method, in that it removes leading and trailing whitespace, while leaving internal whitespace alone.

But if you'd like something like Qt's QString::simplified() method which not only removes leading and trailing whitespace, but also "squishes" all consecutive internal whitespace to one space character, you can use a combination of .split() and " ".join, like this:

some_string = "\t    Hello,  \n\t  world!\n    "
new_string = " ".join(some_string.split())
# new_string is now "Hello, world!"

In this last example, each sequence of internal whitespace replaced with a single space, while still trimming the whitespace off the start and end of the string.

score -1 · Answer 14 · answered Oct 02 '15 at 12:35

Generally, I am using the following method:

>>> myStr = "Hi\n Stack Over \r flow!"
>>> charList = [u"\u005Cn",u"\u005Cr",u"\u005Ct"]
>>> import re
>>> for i in charList:
        myStr = re.sub(i, r"", myStr)

>>> myStr
'Hi Stack Over  flow'

Note: This is only for removing "\n", "\r" and "\t" only. It does not remove extra spaces.

score -17 · Answer 15 · answered Jul 12 '17 at 20:22

-17

This will remove all whitespace and newlines from both the beginning and end of a string:

>>> s = "  \n\t  \n   some \n text \n     "
>>> re.sub("^\s+|\s+$", "", s)
>>> "some \n text"

answered Jul 12 '17 at 20:22

Rafe

1,937
22
31

8

Why use a regex when `s.strip()` does exactly this? – Ned Batchelder Jan 14 '18 at 14:38
2

`s.strip()` only handles the *initial* white space, but not whitespace "discovered" after removing other unwanted characters. Note that this will remove even the whitespace after the final leading `\n` – Rafe Jan 17 '18 at 18:36
Someone down-voted this answer but didn't explain why it is flawed. Shame on you (@NedBatchelder if the down vote was you please reverse as I explained your question and you didn't mention anything actually broken with my answer) – Rafe Jan 17 '18 at 18:37
10

Rafe, you might want to double-check: `s.strip()` produces precisely the same result as your regex. – Ned Batchelder Jan 17 '18 at 21:17
3

@Rafe, you're confusing it with trim. Strip does the required operations. – iMitwe Jan 19 '18 at 17:23
Wow you are right, thanks for pointing that out. I'd like to delete this answer if possible (is grayed out so maybe already done?) – Rafe Jan 30 '18 at 23:11

How do I trim whitespace?

15 Answers15

the problem

regex and patterns: `[\s_]+?\W+`

Linked

Related

How do I trim whitespace?

15 Answers15

the problem

regex and patterns: [\s_]+?\W+

Linked

Related

regex and patterns: `[\s_]+?\W+`