15

I am new to regular expression. I am trying to construct a regular expression that first three characters must be alphabets and then the rest of the string could be any character. If the part of the string after first three characters contains & then this part should start and end with ".

I was able to construct ^[a-z]{3}, but stuck at conditional statement.

For example abcENT and abc"E&T" are valid strings but not abcE&T.

Can this be done in a single expression?

Ooker
  • 1,969
  • 4
  • 28
  • 58
user2359634
  • 1,275
  • 5
  • 14
  • 27

5 Answers5

30

In most regex flavors, you may use simple lookaheads to make sure some text is present or not somewhere to the right of the current locations, and using an alternation operator | it possible to check for alternatives.

So, we basically have 2 alternatives: there is a & somewhere in the string after the first 3 alphabets, or not. Thus, we can use

^[A-Za-z]{3}(?:(?=.*&)".*"|(?!.*&).*)$

See the regex demo

Details:

  • ^ - start of string
  • [A-Za-z]{3} - 3 alphabets
  • (?:(?=.*&)".*"|(?!.*&).*) - Either of the two alternatives:
    • (?=.*&)".*" - if there is a & somewhere in the string ((?=.*&)) match ", then any 0+ characters, and then "
    • | - or
    • (?!.*&).* - if there is no & ((?!.*&)) in the string, just match any 0+ chars up to the...
  • $ - end of string.

In PCRE, or .NET, or some other regex flavors, you have access to the conditional construct. Here is a PCRE demo:

^[A-Za-z]{3}(?(?=.*&)".*"|.*)$
            ^^^^^^^^^^^^^^^^^

The (?(?=.*&)".*"|.*) means:

  • (?(?=.*&) - if there is a & after any 0+ characters...
  • ".*" - match "anything here"-like strings
  • | - or, if there is no &
  • .* - match any 0+ chars from the current position (i.e. after the first 3 alphabets).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
3

A conditional statement could be use with | and groups, but it probably will be complicated.

^[a-z]{3}([^&]*$|".*"$)

You might think about using plain old string manipulation for this task, it probably will be simple

Axnyff
  • 9,213
  • 4
  • 33
  • 37
2

Yeah this is possible, it is not really an if, but in your case you can make an "or" with regex capturing Group. Your regex would look something like that:

\d{3}(\".*\"|[^&]*)

P.S. here is a good site to test and learn These things: https://regex101.com/

Doktor OSwaldo
  • 5,732
  • 20
  • 41
1

The expression itself will depend on the regexp parser you'll use. If you're using Python, shell, vim, boost, etc. , the same symbol could have different meanings.

I would try the following :

$ echo 'abc"&def"' | grep -E "^[a-zA-Z]{3}(\".*\&.*\"|[^&]*)"
abc"&def"
fzd
  • 765
  • 1
  • 6
  • 19
0

Regular expressions don't necessarily support conditionals as in 'if', to achive this in a general case you have to state your conditions as alternatives. (But see Wiktor's comment, depending on your regex engine there might be conditionals available.)

For a relatively basic solution you might try something like this:

^[a-z]{3}([^&]*|\..*\.)$

Which says "After four letters, there should be a string of any length with no ampersand (&) OR a string starting and ending with a full stop (.).

Jan
  • 485
  • 3
  • 9