1

I have the following shape of string: PW[Yasui Chitetsu]; and would like to get only the name inside the brackets: Yasui Chitetsu. I'm trying something like

[^(PW\[)](.*)[^\]]

as a regular expression, but the last bracket is still in it. How do I unselect it? I don't think I need anything fancy like look behinds, etc, for this case.

Philippe Fanaro
  • 6,148
  • 6
  • 38
  • 76
  • Actually your problem is the other way around. This regex matches `asui Chitets` – DeepSpace Dec 16 '18 at 21:36
  • See [Python: How to get multiple elements inside square brackets](https://stackoverflow.com/questions/9403275) and [Regular expression to extract text between square brackets](https://stackoverflow.com/questions/2403122) – Wiktor Stribiżew Dec 16 '18 at 21:36
  • 1
    It looks like you can massively simplify the regex to `PW\[(.*)\]` which will match `Yasui Chitetsu` in this case – DeepSpace Dec 16 '18 at 21:38
  • @DeepSpace, `PW\[(.*)\]` is giving me the full `PW[Yasui Chitetsu]` in Python and when I try it in Atom. Are you using some other standard for Regexs? – Philippe Fanaro Dec 16 '18 at 21:41
  • @PhilippeFanaro If you are using `match` or `search` you should then get the first group, ie `re.match(r'PW\[(.*)\]', 'PW[Yasui Chitetsu]').group(1)` – DeepSpace Dec 16 '18 at 21:44
  • @DeepSpace, you're right, your solution also works, thank you. – Philippe Fanaro Dec 16 '18 at 21:50

1 Answers1

3

The Problems with What You've Tried

There are a few problems with what you've tried:

  • It will omit the first and last characters of your match from the group, giving you something like asui Chitets.
  • It will have even more errors on strings that start with P or W. For example, in PW[Paul McCartney], you would match only ul McCartne with the group and ul McCartney with the full match.

The Regex

You want something like this:

(?<=\[)([^]]+)(?=\])

Here's a regex101 demo.

Explanation

(?<=\[) means that the match must be preceded by [

([^]]+) matches 1 or more characters that are not ]

(?=\])means that the match must be followed by ]

Sample Code

Here's some sample code (from the above regex101 link):

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?<=\[)([^]]+)(?=\])"

test_str = "PW[Yasui Chitetsu]"

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Semicolons

In your title, you mentioned finding text between semicolons. The same logic would work for that, giving you this regex:

(?<=;)([^;]+)(?=;)
elixenide
  • 44,308
  • 16
  • 74
  • 100
  • 1
    Why so complicated? `PW\[(.*)\]` – DeepSpace Dec 16 '18 at 21:45
  • Nice, thank you very much.The semicolons thing was a similar problem in another code. The tutorial you've mentioned also seems very helpful. – Philippe Fanaro Dec 16 '18 at 21:46
  • 1
    @DeepSpace Because OP didn't specify that the `PW` was necessarily constant or that there is only one such group in the string. My example is general. Your example would break on, for example, `AB[Yasui Chitetsu]` or `PW[Yasui Chitetsu] something PW[Another Name]`. Better to err on the side of being more precise. – elixenide Dec 16 '18 at 21:50
  • @PhilippeFanaro Glad it helped! Please remember to accept the answer. :) – elixenide Dec 16 '18 at 21:51
  • Both solutions will work in my case (there are in fact some lines with different prefixes before the brackets), but Ed's seems better indeed. Thanks to you two. – Philippe Fanaro Dec 16 '18 at 21:53
  • @EdCottrell: I think you are right. You taught me a little about regexes I did not know yet (I don't use them very often, but might use them more, now). Thanks. – Rudy Velthuis Dec 16 '18 at 22:21