How can I write python regular expression to match multiple space in a row

Question

I have a string like this, it has multiple spaces before 'READY' and after 'READY'

All empty space in the following examples are Space

'1df34343 43434sebb              READY                     '

How can I write a regular expression which can get '1df34343 43434sebb' as result.group(1)?

Are you having just a single whitespace between `3 and 4` in `..343 434..`? — Rohit Jain, Nov 28 '12 at 10:22

score 3 · Accepted Answer · answered Nov 28 '12 at 10:52

3

This captures the required group if it is followed by multiple spaces + READY. Uses positive look-ahead.

(\S+ \S+)(?=\s{2,}READY)

answered Nov 28 '12 at 10:52

garyh

2,782
1
26
28

Inbar Rose · Answer 2 · 2012-11-28T10:32:24.217

if you understand regular expressions you should know the following:

\s : whitespace characters
\S : non-whitespace characters
+ : at least one of the previous capture.

script:

>>> import re
>>> s = '1df34343 43434sebb              READY                     '
>>> ms = re.match(r"(\S+ \S+)\s+(\S+)\s+", s)
>>> ms.groups()
('1df34343 43434sebb', 'READY')
>>> ms.group(1)
'1df34343 43434sebb'
>>> ms.group(2)
'READY'

you can even have a more functional regex which can be used if you ever need a more detailed parse of what you have:

>>> ms = re.match(r"((\S+) (\S+))\s+(\S+)\s+", s)
>>> ms.groups()
('1df34343 43434sebb', '1df34343', '43434sebb', 'READY')
>>> ms.group(1)
'1df34343 43434sebb'
>>> ms.group(2)
'1df34343'
>>> ms.group(3)
'43434sebb'
>>> ms.group(4)
'READY'

score 1 · Answer 3 · answered Nov 28 '12 at 10:27

Here is a very simple regex that captures everything until it sees two spaces in a row:

In [11]: s = '1df34343 43434sebb              READY                     '

In [12]: re.match(r'(.*?)\s\s', s).groups()
Out[12]: ('1df34343 43434sebb',)

This captures your requirements as I've understood them. If something is amiss, please clarify.

score 0 · Answer 4 · answered Nov 28 '12 at 10:25

Match anything before a multi-space group:

 re.compile(r'^(.*?)(?:\s{2,})')

outputs:

>>> import re
>>> multispace = re.compile(r'^(.*?)(?:\s{2,})')
>>> multispace.match('1df34343 43434sebb              READY                     ').groups()
('1df34343 43434sebb',)

score 0 · Answer 5 · answered Nov 28 '12 at 10:35

0

Why not just split your string in 2 or more spaces. You will get a list, from which you can get the first element, which is the one you need. You don't really need a complex regex for that: -

>>> s = '1df34343 43434sebb              READY                     '
>>> import re
>>> re.split(r'[ ]{2,}', s)[0]
>>> '1df34343 43434sebb'

answered Nov 28 '12 at 10:35

Rohit Jain

209,639
45
409
525

what is the benefit of this? i don't think that this is more efficient than matching, and you lose some functionality in the long run. – Inbar Rose Nov 28 '12 at 10:47
@InbarRose.. Well, I think `split` fits best for the string which OP has posted. It's not that it's good or bad. Even split uses a `regex`. And also, I do love `Regex` myself. But for this particular case, it seems overkill to use build a regex to match complete string, when OP just wants the first part. – Rohit Jain Nov 28 '12 at 10:50

How can I write python regular expression to match multiple space in a row

5 Answers5

Linked