-1

I have the following string:

data-event-title="Yuichi Sugita* vs Adrian Mannarino">
                              <span class="odds-container">
                                                             <b class="odds">1/12</b>
                                                                     </a>

And I would like to capture Yuichi Sugita and 1/12. For that I created the following regex: ata-event-title="(.+)".+ class="odds">(.+)< which has two capture groups in parenthesis (when I use them separately they work fine), but the problem is that the .+ in between them does not work as expected.

Any suggestions are appreciated.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
Nickpick
  • 6,163
  • 16
  • 65
  • 116

3 Answers3

1

If you want to capture text inside data-event-title="" and 1/12 then use regex
data\-event\-title\=\"(.+?)\"[^\0]*class\=\"odds\".*\>(.+?)\<
https://regex101.com/r/4loeLv/1

Or

If you want to capture first person's name inside data-event-title="" then
data\-event\-title\=\"(.+?) vs.*?\"[^\0]*class\=\"odds\".*\>(.+?)\<
https://regex101.com/r/4loeLv/2

lkdhruw
  • 572
  • 1
  • 7
  • 22
1

Your use of dots is "greedy" so they capture as much as they possibly can (and you don't actually want that in this case).

You can change the capture group quantifiers to be "lazy", but it will be more efficient to use negated character classes (syntax [^character]) for your capture groups.

The dot between your two capture groups is fine to be "greedy" because it will be halted when it encounters class="odds"> anyhow.

Assuming you have linebreaks as your sample input shows, your dot will stop on newline characters unless you use the s flag with your pattern. Use this:

r"data-event-title=\"([^*]+).*class=\"odds\">([^<]+)"s

This will capture:

  1. the substring that follows data-event-title=" ending just before the first occurrence of *.
  2. the substring that follows class="odds"> ending just before the first < is found.

Here is the Python regex pattern demo.


If you want the full data-event-title attribute value, this will capture Yuichi Sugita* vs Adrian Mannarino:

r"data-event-title=\"([^\"]+).*class=\"odds\">([^<]+)"s
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
0

I used alternation with the vertical bar or pipe symbol (|). read more here

This regex does what you want:

>(.*)<|data-event-title="([^*]*.).*"

See the saved regex here regex101

Aedvald Tseh
  • 1,757
  • 16
  • 31