regex findall to retrieve a substring based on start and end character

Question

I have the following string:

6[Sup. 1e+02]

I'm trying to retrieve a substring of just 1e+02. The variable first refers to the above specified string. Below is what I have tried.

re.findall(' \d*]', first)

Wiktor Stribiżew · Accepted Answer · 2015-04-28T21:23:58.623

2

You need to use the following regex:

\b\d+e\+\d+\b

Explanation:

\b - Word boundary
\d+ - Digits, 1 or more
e - Literal e
\+ - Literal +
\d+ - Digits, 1 or more
\b - Word boundary

See demo

Sample code:

import re
p = re.compile(ur'\b\d+e\+\d+\b')
test_str = u"6[Sup. 1e+02]"
re.findall(p, test_str)

See IDEONE demo

edited Apr 28 '15 at 21:23

answered Apr 28 '15 at 21:03

Wiktor Stribiżew

607,720
39
448
563

@BillGates: It can be neater, but you only provided one example. I guess all these values are inside square brackets, and they are located right before the closing one. If not, try just `\b\d+e\+\d+\b`. – Wiktor Stribiżew Apr 28 '15 at 21:10
`r` - raw string, where we do not have to use double slashes when building regexes, `u` means a Unicode string. Please check http://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-flags-do-in-python-and-what-are-raw-string-l for more details. – Wiktor Stribiżew Apr 28 '15 at 21:17

score 1 · Answer 2 · answered Apr 28 '15 at 21:17

import re
first = "6[Sup. 1e+02]"
result = re.findall(r"\s+(.*?)\]", first)
print result

Output:

['1e+02']

Demo http://ideone.com/Kevtje

regex Explanation:

\s+(.*?)\]

Match a single character that is a “whitespace character” (ASCII space, tab, line feed, carriage return, vertical tab, form feed) «\s+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 1 «(.*?)»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match the character “]” literally «\]»

And if the input string has more `[]`s with no number in them? `6[Sup. 1e+02] [lorem ipsum]`? http://ideone.com/mh233v — Wiktor Stribiżew, Apr 28 '15 at 21:22
the OP said: "__I have the following string:__ `6[Sup. 1e+02]`" — Pedro Lobito, Apr 28 '15 at 21:23

regex findall to retrieve a substring based on start and end character

2 Answers2