0

here is my code:

object test extends App {

  private val PLAYER_REGEX = """[\s\S]*(?:<td class="align-middle plus-size"> <s class="text-muted">|<td class="align-middle plus-size">)(.*)(?:</s> </td></tr>|</td></tr>)""".r
  val str ="""<td class="align-middle plus-size"> <s class="text-muted">first</s> </td></tr>"""
  val str2 ="""<td class="align-middle plus-size">second</td></tr>"""

  private def find(str:String) = {
    PLAYER_REGEX.findFirstMatchIn(str) match {
      case Some(data) => data.group(1).trim
      case None => "Not found"
    }
  }
  println(find(str))
  println(find(str2))
}

And Output is

first</s>
second

My question is - why those redundant

</s> 

in first case? I thought that

(?:</s> </td></tr>|</td></tr>)  

should select first occurence

</s> </td></tr> 

but looks like it select

</td></tr>???

Off course I can trim it, but it looks ugly. If you can provide another regex I'll also will be glad:)

  • 1
    See https://stackoverflow.com/a/1732454/5344058... – Tzach Zohar Nov 14 '17 at 16:39
  • 1
    Can you clarify what you mean by `why those redundant in first case?` ? The only redundancy I see is in the pattern, since `(?: |) ` could be written `(?:(?: )?)` – Aaron Nov 14 '17 at 16:43
  • hi, Aaron. It should return just "first" not "first". At least I tried to do exactly that:) – Andrei Markhel Nov 14 '17 at 16:45
  • (?:(?: )) works for first case, but not for second:) – Andrei Markhel Nov 14 '17 at 16:47
  • redundancy = multiple occurences of something that would work alone ;) You're getting this result because the previous `.*` being greedy will match up to the end of the string, then backtrack until the next token (the discussed group) matches. It matches "without the " before it backtracks up to the `` – Aaron Nov 14 '17 at 16:47
  • @AndreiMarkhel I missed a `?` after the nested group, sorry (ninja-edited !). It won't solve your problem anyway – Aaron Nov 14 '17 at 16:47

1 Answers1

0

It's caused by this greedy quantifier beforehand:

…(.*)…

Instead, use the lazy version:

…(.*?)…
Nissa
  • 4,636
  • 8
  • 29
  • 37
  • I believe it's the following `(.*)` that greedily captures too much (check the displayed group). But yeah there would be a problem there too – Aaron Nov 14 '17 at 16:50
  • Thanks, Stephen. In that case I can't remove [\s\S]* because I simplified a bit, and whole expression is quite bigger, I just extracted part that not works. I can provide whole expression, off course, but I think it will only add complexity – Andrei Markhel Nov 14 '17 at 16:52
  • @AndreiMarkhel ah, okay. – Nissa Nov 14 '17 at 16:53
  • Yeah, it did the trick. I mean "(.*?)". Thank you very much! As well to Aaron! – Andrei Markhel Nov 14 '17 at 16:57