0

I have a string like this, it has multiple spaces before 'READY' and after 'READY'

All empty space in the following examples are Space

'1df34343 43434sebb              READY                     '

How can I write a regular expression which can get '1df34343 43434sebb' as result.group(1)?

victorsc
  • 714
  • 9
  • 30
michael
  • 106,540
  • 116
  • 246
  • 346

5 Answers5

3

This captures the required group if it is followed by multiple spaces + READY. Uses positive look-ahead.

(\S+ \S+)(?=\s{2,}READY)
garyh
  • 2,782
  • 1
  • 26
  • 28
2

if you understand regular expressions you should know the following:

  • \s : whitespace characters
  • \S : non-whitespace characters
  • + : at least one of the previous capture.

script:

>>> import re
>>> s = '1df34343 43434sebb              READY                     '
>>> ms = re.match(r"(\S+ \S+)\s+(\S+)\s+", s)
>>> ms.groups()
('1df34343 43434sebb', 'READY')
>>> ms.group(1)
'1df34343 43434sebb'
>>> ms.group(2)
'READY'

you can even have a more functional regex which can be used if you ever need a more detailed parse of what you have:

>>> ms = re.match(r"((\S+) (\S+))\s+(\S+)\s+", s)
>>> ms.groups()
('1df34343 43434sebb', '1df34343', '43434sebb', 'READY')
>>> ms.group(1)
'1df34343 43434sebb'
>>> ms.group(2)
'1df34343'
>>> ms.group(3)
'43434sebb'
>>> ms.group(4)
'READY'
Inbar Rose
  • 41,843
  • 24
  • 85
  • 131
1

Here is a very simple regex that captures everything until it sees two spaces in a row:

In [11]: s = '1df34343 43434sebb              READY                     '

In [12]: re.match(r'(.*?)\s\s', s).groups()
Out[12]: ('1df34343 43434sebb',)

This captures your requirements as I've understood them. If something is amiss, please clarify.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
0

Match anything before a multi-space group:

 re.compile(r'^(.*?)(?:\s{2,})')

outputs:

>>> import re
>>> multispace = re.compile(r'^(.*?)(?:\s{2,})')
>>> multispace.match('1df34343 43434sebb              READY                     ').groups()
('1df34343 43434sebb',)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0

Why not just split your string in 2 or more spaces. You will get a list, from which you can get the first element, which is the one you need. You don't really need a complex regex for that: -

>>> s = '1df34343 43434sebb              READY                     '
>>> import re
>>> re.split(r'[ ]{2,}', s)[0]
>>> '1df34343 43434sebb'
Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • what is the benefit of this? i don't think that this is more efficient than matching, and you lose some functionality in the long run. – Inbar Rose Nov 28 '12 at 10:47
  • @InbarRose.. Well, I think `split` fits best for the string which OP has posted. It's not that it's good or bad. Even split uses a `regex`. And also, I do love `Regex` myself. But for this particular case, it seems overkill to use build a regex to match complete string, when OP just wants the first part. – Rohit Jain Nov 28 '12 at 10:50