176

I have the output of a command in tabular form. I'm parsing this output from a result file and storing it in a string. Each element in one row is separated by one or more whitespace characters, thus I'm using regular expressions to match 1 or more spaces and split it. However, a space is being inserted between every element:

>>> str1 = "a    b     c      d"  # spaces are irregular
>>> str1
'a    b     c      d'
>>> str2 = re.split("( )+", str1)
>>> str2
['a', ' ', 'b', ' ', 'c', ' ', 'd']  # 1 space element between!

Is there a better way to do this?

After each split str2 is appended to a list.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
gjois
  • 2,025
  • 3
  • 18
  • 19
  • 2
    I downvoted this question. Reason is that while the question itself is relevant the given example is not hard enough to really require the requested solution. A regex would be required if you have for instance blocks of words, blocks of numbers and you want to separate them into different variables. – erikbstack Mar 03 '18 at 10:42
  • @erikbwork I wanted to remove the unwanted space item in resultant string `'str2'` – gjois Mar 04 '18 at 14:20
  • 2
    Yes and you can achieve that with simply using `str1.split()`. No need for a regex. – erikbstack Mar 04 '18 at 20:44
  • 1
    Does this answer your question? [Split Strings into words with multiple word boundary delimiters](https://stackoverflow.com/questions/1059559/split-strings-into-words-with-multiple-word-boundary-delimiters) – Evandro Coan Dec 28 '21 at 01:30
  • 1
    Kudos to @erikbstack for explaining a downvote. Title and body don't go together. Either the title should be about including delimiters in the re.split() output, or the body should reflect the simpler title, e.g. `['1', '2', '3'] == re.split(r'[, ]+', "1,2 3")` – Bob Stein Feb 20 '23 at 17:02

4 Answers4

217

By using (,), you are capturing the group, if you simply remove them you will not have this problem.

>>> str1 = "a    b     c      d"
>>> re.split(" +", str1)
['a', 'b', 'c', 'd']

However there is no need for regex, str.split without any delimiter specified will split this by whitespace for you. This would be the best way in this case.

>>> str1.split()
['a', 'b', 'c', 'd']

If you really wanted regex you can use this ('\s' represents whitespace and it's clearer):

>>> re.split("\s+", str1)
['a', 'b', 'c', 'd']

or you can find all non-whitespace characters

>>> re.findall(r'\S+',str1)
['a', 'b', 'c', 'd']
jamylak
  • 128,818
  • 30
  • 231
  • 230
27

The str.split method will automatically remove all white space between items:

>>> str1 = "a    b     c      d"
>>> str1.split()
['a', 'b', 'c', 'd']

Docs are here: http://docs.python.org/library/stdtypes.html#str.split

Jeff Tratner
  • 16,270
  • 4
  • 47
  • 67
Trevor
  • 9,518
  • 2
  • 25
  • 26
8

When you use re.split and the split pattern contains capturing groups, the groups are retained in the output. If you don't want this, use a non-capturing group instead.

BrenBarn
  • 242,874
  • 37
  • 412
  • 384
  • 2
    Using `str.split` is probably better for your example. I just wanted to explain why you get the behavior you do. – BrenBarn Jun 11 '12 at 06:05
2

Its very simple actually. Try this:

str1="a    b     c      d"
splitStr1 = str1.split()
print splitStr1
damned
  • 935
  • 2
  • 19
  • 35