Get the string between [ brackets and special characters in python

Question

I have a really similar question to this one.

And i really wonder why my restult is: NaN.

I have a dataframe which this column:

Action
Player[J♡, K♧] won the $5.40 main pot with a Straight
Player [5, 2] won the $21.00 main pot with a flush

and I want to built a new column with the cards, who got played:

[J♡, K♧]
[5, 2]

or even:

[J, K]
[5, 2]

However when I play around on regex and i use: dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_]+)\]', expand=False)

I only got NaN.

Perhaps add them to the character class with a comma and a space `\[([A-Za-z0-9_♤♡♢♧, ]+)\]` — The fourth bird, Feb 05 '21 at 14:10
What about `df['Action'].str.extract(r'\[([^][]+)]', expand=False).str.findall(r'[A-Za-z0-9]+')`? — Wiktor Stribiżew, Feb 05 '21 at 14:13
@Thefourthbird that was the problem, I had to include the special characters as well :) — PV8, Feb 05 '21 at 14:15
You might make it a bit more specific `\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)\]` https://regex101.com/r/LjHP4O/1 — The fourth bird, Feb 05 '21 at 14:21

score 1 · Accepted Answer · answered Feb 14 '21 at 10:46

You can add the characters to the character class in the capture group as in your pattern \[([A-Za-z0-9_♤♡♢♧, ]+)\] or make the pattern a bit more specific:

\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]

The pattern matches:

\[ Match [
( Capture group 1
- [A-Za-z0-9_] Match one of the listed charss
- [♤♡♢♧]? Optionally match one of the listed chars
- ,\s*[A-Za-z0-9_][♤♡♢♧]? Match a comma and the same logic as before the comma
) Close group 1
] Match ]

Regex demo

For example

import pandas as pd

dfpot = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]', expand=False)
print(dfpot)

Output

                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  J♡, K♧
1  Player [5, 2] won the $21.00 main pot with a f...    5, 2

score 0 · Answer 2 · answered Feb 05 '21 at 14:10

0

Try pattern (I assumed that you use () in the text instead [], as was posted in regex demo):

\([^,]+,[^\)]+\)

Explanation:

\( - match ( literally

[^,]+ - match one ore more character other than ,

, - match , literally

[^\)]+ - match one or more characters other than )

\) - match ) literally

Regex demo

answered Feb 05 '21 at 14:10

Michał Turczyn

32,028
14
47
69

2

for me this leads to the error: ValueError: pattern contains no capture groups, the comment above solved the error, I had to include special characters, thx anyway – PV8 Feb 05 '21 at 14:16

score 0 · Answer 3 · answered Feb 14 '21 at 01:03

Use

>>> import pandas as pd
>>> df = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
>>> df['cards'] = df['Action'].str.findall(r'(\w+)(?=[^][]*])')
>>> df
                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  [J, K]
1  Player [5, 2] won the $21.00 main pot with a f...  [5, 2]
>>>

Regex: (\w+)(?=[^][]*])

EXPLANATION

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^][]*                   any character except: ']', '[' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ]                        ']'
--------------------------------------------------------------------------------
  )                        end of look-ahead

Get the string between [ brackets and special characters in python

3 Answers3