0

I want to find the alphanumeric words in lucene automata regex but not entirely numeric and even not entirely alphabets. I have tried

(([a-zA-Z0-9]{1,10})&(.*[0-9].*))

but this returns all numeric words also So i tried to negate all numeric like below but it does not work

(^[0-9])(([a-zA-Z0-9]{1,10})&(.*[0-9].*))

Input String:

  1. DL200, dal2 , 700091

Expected output: DL200 and dal2

but it should not return 700091

happy
  • 2,550
  • 17
  • 64
  • 109

2 Answers2

1

Didn't know much about lucene regex flavor, but a little research tought me that it does not support PCRE library, however some standard operators are supported. I found that it does not include lookarounds nor word boundaries. Have a look at the docs.

Either way, to overcome the lack of support on lookarounds I had a look at this older SO post to use ~ instead. Furthermore, I see you can use the & operator to check if the string matches multiple patterns.

This makes for the assumption the following pattern might work for you:

~[0-9]+&~[^0-9]+&[A-Za-z0-9]{2,10}
  • ~[0-9]+ - Negate a string made of numbers only.
  • &
  • ~[^0-9]+ - Negate a string made of non-numbers only.
  • &
  • [A-Za-z0-9]{2,10} - Matches a string that is made out of 2 to 10 alphanumeric characters.
JvdV
  • 70,606
  • 8
  • 39
  • 70
  • 1
    your answer helped. I added the answer with regex I used – happy Oct 06 '20 at 08:42
  • @happy, there you go. Nice answer. I missed that you wanted to avoid a fully alphabetical string too. In that case, maybe `~[0-9]+&~[^0-9]+&[A-Za-z0-9]{2,10}` works? – JvdV Oct 06 '20 at 09:23
1

With the help of the JvdV answer and with the help of https://stackoverflow.com/a/38665819/9758194, I was able to get the desired output

(([a-zA-Z0-9]{1,10})&(.*[0-9].*))&~([0-9]*)
happy
  • 2,550
  • 17
  • 64
  • 109