0

I have a string with some XML tags in it, like:

"hello <b>world</b> and <i>everyone</i>"

Is there a good Scala/functional way of uppercasing the words, but not the tags, so that it looks like:

"HELLO <b>WORLD<b> AND <i>EVERYONE</i>"
Peter Neyens
  • 9,770
  • 27
  • 33
Ian White
  • 3
  • 1

2 Answers2

0

We can use dustmouse's regex to replace all the text in/outside XML tags with Regex.replaceAllIn. We can get the matched text with Regex.Match.matched which then can easily be uppercased using toUpperCase.

val xmlText = """(?<!<|<\/)\b\w+(?!>)""".r

val string = "hello <b>world</b> and <i>everyone</i>"
xmlText.replaceAllIn(string, _.matched.toUpperCase)
// String = HELLO <b>WORLD</b> AND <i>EVERYONE</i>

val string2 = "<h1>>hello</h1> <span>world</span> and <span><i>everyone</i>"
xmlText.replaceAllIn(string2, _.matched.toUpperCase)
// String = <h1>>HELLO</h1> <span>WORLD</span> AND <span><i>EVERYONE</i>

Using dustmouse's updated regex :

val xmlText = """(?:<[^<>]+>\s*)(\w+)""".r

val string3 = """<h1>>hello</h1> <span id="test">world</span>"""
xmlText.replaceAllIn(string3, m => 
  m.group(0).dropRight(m.group(1).length) + m.group(1).toUpperCase)
// String = <h1>>hello</h1> <span id="test">WORLD</span>
Peter Neyens
  • 9,770
  • 27
  • 33
  • 1
    Needs work. `

    >hello

    world and everyone` results in `

    >HELLO

    WORLD AND EVERYONE`
    – The Archetypal Paul Aug 26 '15 at 20:32
  • @dustmouse That works for Pauls case, but if we add attributes (``), this also does not work correctly, but the OP has not specified attributes, so it is definitely an improvement. – Peter Neyens Aug 26 '15 at 21:54
  • How about this: (?:<[^<>]+>\s*)(\w+). https://regex101.com/r/bU1cD9/4. As long as scala can work with capture groups. – lintmouse Aug 26 '15 at 22:04
0

Okay, how about this. It just prints the results, and takes into consideration some of the scenarios brought up by others. Not sure how to capitalize the output without mercilessly poaching from Peter's answer:

val string = "<h1 id=\"test\">hello</h1> <span>world</span> and <span><i>everyone</i></span>"
val pattern = """(?:<[^<>]+>\s*)(\w+)""".r

pattern.findAllIn(string).matchData foreach {
    m => println(m.group(1))
}

The main thing here is that it is extracting the correct capture group.

Working example: http://ideone.com/2qlwoP

Also need to give credit to the answer here for getting capture groups in scala: Scala capture group using regex

Community
  • 1
  • 1
lintmouse
  • 5,079
  • 8
  • 38
  • 54