The Problems with What You've Tried
There are a few problems with what you've tried:
- It will omit the first and last characters of your match from the group, giving you something like
asui Chitets
.
- It will have even more errors on strings that start with
P
or W
. For example, in PW[Paul McCartney]
, you would match only ul McCartne
with the group and ul McCartney
with the full match.
The Regex
You want something like this:
(?<=\[)([^]]+)(?=\])
Here's a regex101 demo.
Explanation
(?<=\[)
means that the match must be preceded by [
([^]]+)
matches 1 or more characters that are not ]
(?=\])
means that the match must be followed by ]
Sample Code
Here's some sample code (from the above regex101 link):
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"(?<=\[)([^]]+)(?=\])"
test_str = "PW[Yasui Chitetsu]"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Semicolons
In your title, you mentioned finding text between semicolons. The same logic would work for that, giving you this regex:
(?<=;)([^;]+)(?=;)