-2

I have a really similar question to this one.

And i really wonder why my restult is: NaN.

I have a dataframe which this column:

Action
Player[J♡, K♧] won the $5.40 main pot with a Straight
Player [5, 2] won the $21.00 main pot with a flush

and I want to built a new column with the cards, who got played:

[J♡, K♧]
[5, 2]

or even:

[J, K]
[5, 2]

However when I play around on regex and i use: dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_]+)\]', expand=False)

I only got NaN.

PV8
  • 5,799
  • 7
  • 43
  • 87

3 Answers3

1

You can add the characters to the character class in the capture group as in your pattern \[([A-Za-z0-9_♤♡♢♧, ]+)\] or make the pattern a bit more specific:

\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]

The pattern matches:

  • \[ Match [
  • ( Capture group 1
    • [A-Za-z0-9_] Match one of the listed charss
    • [♤♡♢♧]? Optionally match one of the listed chars
    • ,\s*[A-Za-z0-9_][♤♡♢♧]? Match a comma and the same logic as before the comma
  • ) Close group 1
  • ] Match ]

Regex demo

For example

import pandas as pd

dfpot = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]', expand=False)
print(dfpot)

Output

                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  J♡, K♧
1  Player [5, 2] won the $21.00 main pot with a f...    5, 2
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

Try pattern (I assumed that you use () in the text instead [], as was posted in regex demo):

\([^,]+,[^\)]+\)

Explanation:

\( - match ( literally

[^,]+ - match one ore more character other than ,

, - match , literally

[^\)]+ - match one or more characters other than )

\) - match ) literally

Regex demo

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
  • 2
    for me this leads to the error: ValueError: pattern contains no capture groups, the comment above solved the error, I had to include special characters, thx anyway – PV8 Feb 05 '21 at 14:16
0

Use

>>> import pandas as pd
>>> df = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
>>> df['cards'] = df['Action'].str.findall(r'(\w+)(?=[^][]*])')
>>> df
                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  [J, K]
1  Player [5, 2] won the $21.00 main pot with a f...  [5, 2]
>>> 

Regex: (\w+)(?=[^][]*])

EXPLANATION

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^][]*                   any character except: ']', '[' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ]                        ']'
--------------------------------------------------------------------------------
  )                        end of look-ahead
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37