1

I've got a basic cfsearch that works fine, but occasionally it can be broken with search strings like the following;

  • my search string]

  • "my search string

  • my search string[

  • my search: string

Any of the above will result in an error like;

Error executing query : org.apache.lucene.queryParser.ParseException: Cannot parse '"my search string': Lexical error at line 1, column 32. Encountered: after : "\"my search string"

I was thinking I could strip out those characters, but you might have a working search term with, say, two "" - ie. "my search string" - which is valid. Is there a preferable way to prepare a string for cfsearch?

So, in the example of:

"my search string

it would strip out the first ". But if the search term was:

"my search string"

all good - leave it alone. Any ideas?! Are there any other characters that can cause an error? For example, a hacker tried this;

XyOk,'.](.]]]'

Which caused an error.

femtoRgon
  • 32,893
  • 7
  • 60
  • 87
luke
  • 415
  • 1
  • 4
  • 14
  • 2
    In Lucence 4+ these are all special characters `+ - && || ! ( ) { } [ ] ^ " ~ * ? : \ /` see http://lucene.apache.org/core/4_0_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Escaping_Special_Characters – John Whish Jan 08 '16 at 14:07
  • thanks john, good to know – luke Jan 11 '16 at 07:46

1 Answers1

3

Use the VerityClean UDF from CFLib to sanitize the Verity/Lucene search parameter. (NOTE: Add :, ^ and * to the pipe-delimited reBadChars variable so they will be stripped for Lucene.)

http://www.cflib.org/udf/verityClean

James Moberg
  • 4,360
  • 1
  • 22
  • 21