1

can anyone help me with the following regular expression Language of all those words that don’t contain the substring aaaa. i just want to get the idea. i came up with the following R.E

" (ab)*(aaaaa)*"

any help will be appreciated. thanks

Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • 1
    The use of the term "language" indicates that you might be approaching this from a more computer sciencey perspective, rather than a programming perspective. Are we allowed to use constructs like negative lookaheads, which aren't in CS definitions of regular expressions? In a practical program, we would just search for a match for `aaaa` and negate the result. – user2357112 Apr 11 '15 at 20:19
  • i am studying these in modelling and simulation subject – pasha ahmed Apr 11 '15 at 20:25
  • @pashaahmed: how is this a part of a *Modeling and Simulation course*, this should be part of *Fundamentals*, or *Automata and Computability*, but this has almost nothing to do with modeling nor simulating... – Willem Van Onsem Apr 11 '15 at 20:30
  • well i dont know :) i am students of bs in bioinformatics. it's our 7th sem course. – pasha ahmed Apr 11 '15 at 20:46

1 Answers1

1

You can use a negative look-ahead :

^((?!aaaa).)*$

This will match any combinations of length 0 or more of characters that not followed by aaaa.

Regular expression visualization

aaaa matches the characters aaaa literally (case sensitive)

. matches any character (except newline)

* Between zero and unlimited times

Debuggex Demo

Also as an alternative and more efficient way in general as @CommuSoft says you can use the following pattern :

^(a{1,3}[^a]|[^a])*a{0,3}$
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • thanks. i want to learn more about these regular expressions. can you share an easy source @kasra – pasha ahmed Apr 11 '15 at 20:17
  • @pashaahmed welcome but as far as i know this is the best and more elegant way for this problem! for more detail you can read http://stackoverflow.com/questions/406230/regular-expression-to-match-text-that-doesnt-contain-a-word – Mazdak Apr 11 '15 at 20:21
  • If the regex engine doesn't use dynamic programming - which unfortunately many don't seem to do - it can result in a computationally intensive task... Another way to do this is `^(a{1,3}[^a]|[^a])*$` I think... – Willem Van Onsem Apr 11 '15 at 20:24
  • @CommuSoft As far as i know except bash mot of the popular lang support `look-around` and also dynamic programming – Mazdak Apr 11 '15 at 20:27
  • @Kasra: I don't say they don't support it, and `grep` does this as well btw (Java for instance doesn't). But it is in many cases not implemented with an efficient algorithm. Which is too bad... – Willem Van Onsem Apr 11 '15 at 20:28
  • @CommuSoft yep, in that case, i'm agree! i'll update the answer thanks for attention! – Mazdak Apr 11 '15 at 20:30
  • 1
    @Kasra: indeed, there is a small part missing `^(a{1,3}[^a]|[^a])*a{0,3}$`... Furthermore, dont forget the `^` and `$` markers in your answer, otherwise the regex will [simply match a small substring](https://regex101.com/r/oT0sX0/1). – Willem Van Onsem Apr 11 '15 at 20:32