56

I have a sample string <alpha.Customer[cus_Y4o9qMEZAugtnW] active_card=<alpha.AlphaObject[card] ...>, created=1324336085, description='Customer for My Test App', livemode=False>

I only want the value cus_Y4o9qMEZAugtnW and NOT card (which is inside another [])

How could I do it in easiest possible way in Python? Maybe by using RegEx (which I am not good at)?

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
user993563
  • 18,601
  • 10
  • 42
  • 55

8 Answers8

108

How about:

import re

s = "alpha.Customer[cus_Y4o9qMEZAugtnW] ..."
m = re.search(r"\[([A-Za-z0-9_]+)\]", s)
print m.group(1)

For me this prints:

cus_Y4o9qMEZAugtnW

Note that the call to re.search(...) finds the first match to the regular expression, so it doesn't find the [card] unless you repeat the search a second time.

Edit: The regular expression here is a python raw string literal, which basically means the backslashes are not treated as special characters and are passed through to the re.search() method unchanged. The parts of the regular expression are:

  1. \[ matches a literal [ character
  2. ( begins a new group
  3. [A-Za-z0-9_] is a character set matching any letter (capital or lower case), digit or underscore
  4. + matches the preceding element (the character set) one or more times.
  5. ) ends the group
  6. \] matches a literal ] character

Edit: As D K has pointed out, the regular expression could be simplified to:

m = re.search(r"\[(\w+)\]", s)

since the \w is a special sequence which means the same thing as [a-zA-Z0-9_] depending on the re.LOCALE and re.UNICODE settings.

srgerg
  • 18,719
  • 4
  • 57
  • 39
  • 1
    could you please explain your answer the regex part, so that i donot have to ask again for similar type of problems. Thanks. – user993563 Dec 20 '11 at 00:15
  • I've edited my answer with an explanation of the regular expression and links to the python regular expression documentation. – srgerg Dec 20 '11 at 00:26
  • 7
    Why not replace `[A-Za-z0-9_]` with `\w`? – D K Dec 20 '11 at 00:30
  • Yes, you could replace `[A-Za-z0-9_]` with `\w`. – srgerg Dec 20 '11 at 00:34
  • how could plus and minus signs '+' and '-' be added to character sets? i tried using [A-Za-z0-9_\+\-] but it didnt work!! – Masih Jan 18 '16 at 06:23
  • 2
    @user3015703 In a character set you don't need to escape special characters, except for '-' or ']'. To include a dash you can either precede it with a slash, or make it the first or last character in the set. So using '[A-Za-z0-9_+-]' should work. See the [Python regular expression syntax documentation](https://docs.python.org/3/library/re.html#regular-expression-syntax) – srgerg Jan 20 '16 at 22:35
  • you can use `re.findall(r"\{{([A-Za-z0-9_]+)\}}", s)` to find all the occurrences. – hamed Mar 04 '16 at 08:56
  • 1
    This doesn't seem to work if the expression in the brackets has a `.` in it. Any ideas for how to make that work? – Matt Jan 22 '18 at 17:36
25

You could use str.split to do this.

s = "<alpha.Customer[cus_Y4o9qMEZAugtnW] active_card=<alpha.AlphaObject[card]\
 ...>, created=1324336085, description='Customer for My Test App',\
 livemode=False>"
val = s.split('[', 1)[1].split(']')[0]

Then we have:

>>> val
'cus_Y4o9qMEZAugtnW'
David Alber
  • 17,624
  • 6
  • 65
  • 71
  • Yeah, it depends on how much messy the strings are, but a split could work here too.. – redShadow Dec 20 '11 at 00:10
  • though it wont effect much, but which of the two using regex/split is more efficient. Also could you please explain your splitting part. Thanks. – user993563 Dec 20 '11 at 00:16
  • @user993563 Have a look at the link to `str.split` in the answer for examples. Briefly, the first `split` in the solution returns a list of length two; the first element is the substring before the first `[`, the second is the substring after `]`. As for performance, you should measure that to find out (look at [`timeit`](http://docs.python.org/library/timeit.html)). If you plan to do the value extraction several times in one run of the program and decide to use regular expressions, you might want to [`compile`](http://docs.python.org/library/re.html#re.compile) the regex. – David Alber Dec 20 '11 at 00:33
  • 1
    @user993563 Note that your request for the "easiest possible way in python" may be at odds with the performance consideration. I chose to use `split` because I felt that reflected your request for "easiest". – David Alber Dec 20 '11 at 00:45
  • simple to understand, better than re. – Shaida Muhammad Dec 08 '22 at 17:25
20

This should do the job:

re.match(r"[^[]*\[([^]]*)\]", yourstring).groups()[0]
redShadow
  • 6,687
  • 2
  • 31
  • 34
  • I guess this was more than the OP needed since his case only needed alphanumerics, but this did the trick for me. Thanks! – extarbags Oct 24 '13 at 16:12
  • This would be really slow as we are checking a llllllotttt of stuff. therefore I would rather suggest using "\[(.*?)\]", follow my answer below if the brackets aren't shown here properly. – PanDe Jan 04 '21 at 10:55
8
your_string = "lnfgbdgfi343456dsfidf[my data] ljfbgns47647jfbgfjbgskj"
your_string[your_string.find("[")+1 : your_string.find("]")]

courtesy: Regular expression to return text between parenthesis

Community
  • 1
  • 1
OmaL
  • 5,037
  • 3
  • 31
  • 48
5

You can also use

re.findall(r"\[([A-Za-z0-9_]+)\]", string)

if there are many occurrences that you would like to find.

See also for more info: How can I find all matches to a regular expression in Python?

Community
  • 1
  • 1
5

You can use

import re

s = re.search(r"\[.*?]", string)
if s:
    print(s.group(0))
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
shubham
  • 59
  • 1
  • 3
2

How about this ? Example illusrated using a file:

f = open('abc.log','r')
content = f.readlines()
for line in content:
    m = re.search(r"\[(.*?)\]", line)
    print m.group(1)
    

Hope this helps:

Magic regex : \[(.*?)\]

Explanation:

\[ : [ is a meta char and needs to be escaped if you want to match it literally.

(.*?) : match everything in a non-greedy way and capture it.

\] : ] is a meta char and needs to be escaped if you want to match it literally.

PanDe
  • 831
  • 10
  • 21
1

This snippet should work too, but it will return any text enclosed within "[]"

re.findall(r"\[([a-zA-Z0-9 ._]*)\]", your_text)