0

I have to process some text that ends in a date, and I'm trying to extract the individual parts using Kotlin RegEx. I can get this to work with a space, but not with a word boundary \b.

Working Code:

val name = "Prefix Text Sept 7 2007"
var regex = Regex("""(?<prefix>.*?)\b(?<month>Sept|September) (?<day>\d{1,2}) (?<year>\d{2,4})""")
var matched = regex.matchEntire(name)
println("Prefix: ${matched!!.groups["prefix"]?.value}")
println("Month: ${matched!!.groups["month"]?.value}")
println("Day: ${matched!!.groups["day"]?.value}")
println("Year: ${matched!!.groups["year"]?.value}")

Expected Output:

Prefix: Prefix Text 
Month: Sept
Day: 7
Year: 2007

If I replace the second line with:

var regex = Regex("""(?<prefix>.*?)\b(?<month>Sept|September)\b(?<day>\d{1,2})\b(?<year>\d{2,4})""")

I don't get a match. This just replaces the space before the day and the year with a \b. I would like to understand why this second attempt does not match.

O.O.
  • 1,973
  • 6
  • 28
  • 40
  • Word boundary matches a *location* in the string, it does not match any whitespace. – Wiktor Stribiżew Jan 28 '20 at 00:27
  • `\b` is a word boundary, which is zero-length. If there's a white space there (or any character), the pattern won't match. – CAustin Jan 28 '20 at 00:28
  • I'm sorry, but I'm not sure why this question is closed. The discussion in the other question relates to the minus sign - being a non-word character and hence regarded as a word boundary. Here, a white space character is a word boundary, so the second RegEx should have worked. The other question does not answer my problem here. – O.O. Jan 28 '20 at 01:09
  • As explained above, I would like to have this question re-opened. – O.O. Jan 28 '20 at 01:33

0 Answers0