-2

I am teaching regular expression to some good programmers. They are good at programming but hardly use regex. My task is to train them up so they know when to use regex and when not.

After showing most regular expression features, I found they are parsing everything with regex. This is not what I want. I want they know that there are some texts that can never be parsed with regex.

But I am out of luck. I know regular expression can parse regular language. If its a non regular language it can not parse it. So I am looking for non regular language example.

My target is when they fail to parse it, they will come up with some custom parser.

So, could you provide some good example of such non-regular language?

Shiplu Mokaddim
  • 56,364
  • 17
  • 141
  • 187

2 Answers2

3

The best example is the parsing html

Show your students this:

<div>
  <div>some shit</div>
  <div>
    This is some shit again
    <div>
      Really? Is this parsable?  
    </div>
  </div>
</div>

and ask them to match the content of the inner most div, provided the html is dynamic.

In general, ask your students not to parse any other language using regex.

The best way to teach them that is by making them read this answer

In other words:

Use regex only when there is a uniform pattern of something

Also,

  • You cannot parse palindromes
  • You cannot parse another regex
  • You cannot match people's names and emails as they vary.(Email can be matched, but is a overkill)
Community
  • 1
  • 1
Amit Joki
  • 58,320
  • 7
  • 77
  • 95
2

A simple and understandable example of a non-regular language would be the language of palindromes, or in other words, strings that are equal to their reverses. It's pretty easy to demonstrate its non-regularity with the pumping lemma (see Wikipedia: http://en.wikipedia.org/wiki/Pumping_lemma)

Mind though, that in practical computing, the distinction isn't quite as clear, as many regular expression engines support features such as back references that allow recognition of certain non-regular languages. A regex engine with back references can match, for example, the language of squares or repetitions ("PonyPony", "123123", "gg" etc): (.*)\1 which isn't possible without back references.

kviiri
  • 3,282
  • 1
  • 21
  • 30
  • So is backreference making regular expression parse non-regular language? – Shiplu Mokaddim May 04 '14 at 08:48
  • @shiplu.mokadd.im, yes, you can use back references to parse certain non-regular languages, for example the language of squares (strings that are the same substring repeated twice). Not all of them though - there are some languages that can't be recognized even with a Turing machine. – kviiri May 04 '14 at 08:54