2

Let's say I have a query string like that:

#some terms! "phrase query" in:"my container" in:group_3

or

#some terms!

or

in:"my container" in:group_3 terms! "phrase query"

or

in:"my container" test in:group_3 terms!

What is the best way to parse this correctly?

I've looked at Lucene's SimpleQueryParser but it seems quite complicated for my usecase. And I'm trying to parse that query using regexes but not really successful until now, mostly due to the possibility of using whitespace inside quotes

Any simple idea?

I just need to get as output a list of elements, afterward it's pretty easy for me to solve the rest of the problem:

[
  "#some",
  "terms!",
  "phrase query",
  "in:\"my container\"",
  "in:group_3"
]
Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419
  • Maby try including ".*?" in your regex this will match anything inside "" – Maciej Kozieja Feb 10 '17 at 14:08
  • I added a possible and very simple output – Sebastien Lorber Feb 10 '17 at 14:18
  • @MYGz I'm not a regex expert yet but your solution looks a good inspiration, except it does only work for my input but it not really general (for example it fails for `#some terms! "phrase query" in:"my container" ` which is my query minus the end – Sebastien Lorber Feb 10 '17 at 14:34
  • Possible duplicate of [Regex for splitting a string using space when not surrounded by single or double quotes](http://stackoverflow.com/questions/366202/regex-for-splitting-a-string-using-space-when-not-surrounded-by-single-or-double) – sp00m Feb 10 '17 at 14:43

2 Answers2

2

The following regex matches the text of your output:

(?:\S*"(?:[^"]+)"|\S+)

See the demo

Niitaku
  • 835
  • 9
  • 19
0

Just for those interested, here's the final Scala/Java parser I used to solve my problem, inspired by answers in this question:

def testMatcher(query: String): Unit = {
  def optionalPrefix(groupName: String) = s"(?:(?:(?<$groupName>[a-zA-Z]+)[:])?)"
  val quoted = optionalPrefix("prefixQuoted") + "\"(?<textQuoted>[^\"]*)\""
  val unquoted = optionalPrefix("prefixUnquoted") + "(?<textUnquoted>[^\\s\"]+)"
  val regex = quoted + "|" + unquoted
  val matcher = regex.r.pattern.matcher(query)
  var results: List[QueryTerm] = Nil
  while (matcher.find()) {
    val quotedResult = Option(matcher.group("textQuoted")).map(textQuoted =>
      (Option(matcher.group("prefixQuoted")),textQuoted)
    )
    val unquotedResult = Option(matcher.group("textUnquoted")).map(textUnquoted =>
      (Option(matcher.group("prefixUnquoted")),textUnquoted)
    )
    val anyResult = quotedResult.orElse(unquotedResult).get
    results = QueryTerm(anyResult._1,anyResult._2) :: results
  }
  println(s"results=${results.mkString("\n")}")
}
Sebastien Lorber
  • 89,644
  • 67
  • 288
  • 419