2

I want to implement Regex in fq but never implemented it before.

I have the below value in a property and the fieldtype is "lowercase": Prop=company1@city1@state1@country1@senior analytical chemist, chicago

I want to filter the results based on the regex. The regex should match the above if "company1@city1@state1@country1@"+ regex to match chicago and analytical anywhere after last @ symbol.

My requirement is to match the exact values before last @ and then use regex to match the remaining strings as I want to do free text search only on the last part. I cant split the data into multiple columns as its a multi-valued field.

I tried the below regex in the code to match the string after last @. It works fine in the code but not sure how to implement same in SOLR

/([^@]+(?=.*IL)(?=.*chicago)(?=.*analytical))/ig 

Can someone please let me know how to use above regex with SOLR?

GYaN
  • 2,327
  • 4
  • 19
  • 39
Nancy
  • 31
  • 1
  • 4

1 Answers1

4

Regular expressions in Solr is provided by searching with q=field:/regex/. This assumes that the field type in question is a string field (or at least a field with a KeywordTokenizer) as the matching happens on the token level (and if you have a analyzed field, it might be split into separate tokens and won't match the regex).

Something like q=field:/([^@]+(?=.*IL)(?=.*chicago)(?=.*analytical))/ could work, but the /i/ modifier indicates that you don't want to care about casing. I'd use a field with a KeywordTokenizer and a LowercaseFilter, and then use a lowercase regex to search:

<analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>           
    <filter class="solr.LowerCaseFilterFactory" />
</analyzer>

and to query:

q=field:/([^@]+(?=.*il)(?=.*chicago)(?=.*analytical))/
MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • 1
    Thanks MatsLindh for the reply... I tried to frame the query as above but its fetching me zero results. I have the data in my index and able to fetch its when i query it as exact match but with regex its not returning the record. – Nancy Apr 04 '18 at 06:27
  • @Nancy So what is the definition of the field? Did you reindex after changing it? Do you have an actual field value that can be tested against? – MatsLindh Apr 04 '18 at 07:19
  • Do you have an example value (anonymized as necessary) for the field? – MatsLindh Apr 06 '18 at 06:50
  • The field type is "lowercase", its using the same analyzer as suggested by you. The core was reindexed after the changes. The exact value is **mumbai-baerum-test engineer trainee**. I want to match this value if user passed mumbai-marium-trainee engineer. Want to do the exact match for the first two values and a regex to search only after the last hyphen(-). The words can be anywhere in the last part of - – Nancy Apr 06 '18 at 06:53
  • wasn't sure how to implement exact match and regex in single query so i just gave the try on matching the last part. i tried the **fq=field:/([^-]+(?=.*engineer)(?=.*trainee))/** but it's not fetching me the results, then i tried **fq=field:/(.*engineer.*)(.*trainee.*)/** it got me the result but its comparing the whole string and if i give trainee first and then engineer, i am not getting the result. I dont have regex knowledge so not able to debug whats going wrong. – Nancy Apr 06 '18 at 07:05
  • Your second query isn't the same as the first one, so if you're performing a different query than what you say you want - you won't get any hits. Without `(?=` you're effectively saying that position _do matter_. See [Regex for existence of some words whose order doesn't matter](https://stackoverflow.com/questions/24656131/regex-for-existence-of-some-words-whose-order-doesnt-matter) for an example and explanation. However, I think I'd try to rephrase this to parse the meaning of your string when you're indexing instead of shoehorning regex to make it work when querying. – MatsLindh Apr 06 '18 at 07:11