0

I want to split a long scala file by the javadoc it contains, into some parts.

source split """(?s)\/\*\*(.*?)\*\/"""

works, but it will ignore all the javadoc it matchs.

How to get all parts?

For example:

/** package */
package test

/**
 * Class user
 */
class class User

It will be split into 4 parts:

/** package */

and

package test

and

/**
 * Class user
 */

and

case class User

How to do it?

Freewind
  • 193,756
  • 157
  • 432
  • 708
  • Relevant: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – dhg Mar 15 '12 at 11:02

2 Answers2

1

First: note that programming language syntax is not regular, and, thus, cannot actually be parsed with a regular expression. It is context-free and, thus, you will require at least a context-free grammar to parse it. You might be able to get by with something for simple cases (ie, a subset of the true syntax), but it is impossible to write an expression that will work in all cases.

That said, this works for the case you gave:

val split = source split """(?s)/\*\*|\*/"""
val parts =
  split.grouped(2).flatMap { 
    case Array(code,comment) => Seq(code, "/**" + comment + "*/")
    case code => code
  }
  .map(_.trim)
  .filter(_.nonEmpty)

The variable parts then contains the 4 strings you specified.

This expression will fail on, for example, an input where /** is contained inside a javadoc comment (/** /** */) or a is found between the quotation marks of a string literal (val s = " /** ").

dhg
  • 52,383
  • 8
  • 123
  • 144
  • Thanks, I know if I want to get the accurate result I need to use a parser, but simple regex is enough for my case. – Freewind Mar 15 '12 at 11:35
1

Try these:

val source = """/** package */
package test

/**
 * Class user
 */
class class User"""

val R = """(?s)/\*\*.*?\*/"""

val x = R.r.findAllIn(source)
val y = source.split(R).toList.tail

val parts = x.toList.zip(y).flatMap(x => List(x._1, x._2))

As dhg said, using regex to solve such a problem is not recommended. It's slow and fragile.

xiefei
  • 6,563
  • 2
  • 26
  • 44