In Solr I have a field dedicated to URLs. The URL field can be anywhere up to 2000 in length. However, I only ever need to search the first 200 characters.
Example URL: https://www.google.co.uk/search/2014/here/?q=help+me&oq=stackoverflow&aqs=c
I've experimented over the last 2 weeks with Grams and various combinations of Tokenizers to no avail. I always seem to fall short. I would provide examples but they are all standard so no point cluttering this with non-working types.
The main problem seems to be with how Solr deals with punctuation. It treats non-A-z/0-9 characters as separators. How do I disable this for a field?
For example I can search: 'google' and get the correct result, but when I search 'google.co' nothing comes back. Same problem with most of the non-A-z/0-9 characters, it seems to treat them as a separator.
Everything needs to be *wildcard*searchable from 4char strings up to 200 char strings.
So the following search terms would return the above result. '&aqs','ow&aqs=','ps://www.goo','q=help+','2014/he'... etc
How would you define a field type for the URL wildcard use case?