0

I am unable to create a regex that detects the following text:

  • some text, FIL values, some text
  • some text, F.I.L. values, some text
  • some text, F.IL values, some text
  • some text, FI.L values, some text
  • some text, FI.L. values, some text

I have tried many things without success: I have looked at many questions and many webs, these is just a small example:

I have also used pages like: https://regex101.com/ to see results.

Some patterns that are close but incomplete:

  • (?i)\b([a-z0-9]\.)([a-z0-9]\.)L\b
  • \b[A-Z](?:[&.]?[A-Z])(?:[&.]?[A-Z])\b
  • \bF(\\.&)?IL\b
  • \bF(\\.)?IL\b
  • [F](?:[\\.&]*[I][\\.&]*[L][\\.&]*)
  • [F|f][\.]*[A-Z]\.[A-Z]\.

Not sure what I am missing. I have tried using specific characters, also groups, but have failed to do so.

The intention is to capture FIL word and all that is after until the next ,

jalazbe
  • 1,801
  • 3
  • 19
  • 40

1 Answers1

1

I think you're over-complicating this. Let's write out the requirements in plain English:

  1. the text must contain the letters "F", "I", and "L", in that order
  2. there may be dots between the letters, and after the "L", but no other characters
  3. there must be a space before the "F", and a space after the "L" (or after the last dot
  4. continue matching from there until the next comma

So we can build up our regular expression as follows:

  1. /FIL/ matches the literal string "FIL"
  2. \. means a literal dot (because . on its own means "anything"), and \.? means "optional dot"; put those where we want to allow them: /F\.?I\.?L\.?/
  3. a space can be represented literally, / F\.?I\.?L\.? /; but we might want to get smart and allow things like "tab" as well using \s: /\sF\.?I\.?L\.?\s/
  4. matching "until the next comma" is the same as "matching anything other than a comma", which can be written [^,]* (read: "not comma, zero or more times"); so we get /\sF\.?I\.?L\.?\s[^,]*/

That's it; all test cases pass.

If you need anything more complex, think up some test cases that should pass and some that should fail, make a small change, and test it. Regular expressions are notoriously hard to read, so it's usually easier to build up from scratch than trying to reverse-engineer somebody else's complicated case.

IMSoP
  • 89,526
  • 13
  • 117
  • 169