2

I'm thinking of presenting questions in the form of "here is your input: [foo], here are the capture groups/results: [bar]" (and maybe writing a small script to test their answers for my results).

What are some good regex questions to ask? I need everything from beginner questions like "validate a 4 digit number" to "extract postal codes from addresses".

girasquid
  • 15,121
  • 2
  • 48
  • 58
  • 4
    Since there's no "Right" answer, and this is just discussion with voting, you moght want to make this a "Community Wiki" question. Edit the question, click the community wiki button. – S.Lott Dec 29 '09 at 19:59
  • I don't see why this should be CW. Just because there might be more than one right answer shouldn't stop people from earning rep by posting good ones. In addition, it would make sense to mark the most helpful answer as accepted. There are loads of questions with lots of good answers - this is just another one. – Samir Talwar Dec 29 '09 at 22:55
  • Just as a comment on the question, for developing, testing, digesting and understanding any given regular expression, check out the following tool: http://ultrapico.com/expresso.htm . It's an awesome tool, and will show you what your regex does in "plain english". – eidylon Dec 31 '09 at 02:05

13 Answers13

4

A few that I can think off the top of my head:

  1. Phone numbers in any format e.g. 555-5555, 555 55 55 55, (555) 555-555 etc.
  2. Remove all html tags from text.
  3. Match social security number (Finnish one is easy;)
  4. All IP addresses
  5. IP addresses with shorthand netmask (xx.xx.xx.xx/yy)
Kimvais
  • 38,306
  • 16
  • 108
  • 142
3

There's a bunch of examples of various regular expression techniques over at www.regular-expressions.info - everything for simple literal matching to backreferences and lookahead.

James Kolpack
  • 9,331
  • 2
  • 44
  • 59
2

To keep things a bit more interesting than the usual email/phone/url stuff, try looking for more original exercises. Avoid boredom.

For example, have a look at the Forsysth-Edwards Notation which is used for describing a particular board position of a chess game.

Have your students validate and extract all the bits of information from a string like this:

rnbqkbnr/pp1ppppp/8/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R b KQkq - 1 2

Additionaly, have a look at algebraic chess notation, used to describe moves. Extract chess moves out of a piece of text (and make them bold).

1. e4 e5 2. Nf3 Black now defends his pawn 2...Nc6 3. Bb5 Black threatens c4

Geert
  • 1,804
  • 15
  • 15
1
  • Validate phone numbers (extract area code + rest of number with grouping) (Assuming US phone number, otherwise generalize for you style)
  • Play around with validating email address (probably want to tell the students that this is hugely complicated regular expression but for simple ones it is pretty straight forward)
Jesse Vogt
  • 16,229
  • 16
  • 59
  • 72
1

regexplib.com has a good library you can search through for examples.

eidylon
  • 7,068
  • 20
  • 75
  • 118
0

H0w about extract first name, middle name, last name, personal suffix (Jr., III, etc.) from a format like:

Smith III, John Paul

How about Reg Ex to remove line breaks and tabs from the input

HLGEM
  • 94,695
  • 15
  • 113
  • 186
0

I would start with the common ones:

  • validate email
  • validate phone number
  • separate the parts of a URL
Gabriel McAdams
  • 56,921
  • 12
  • 61
  • 77
0

Be cruel. Tell them parse HTML.

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
The Matt
  • 6,618
  • 7
  • 44
  • 61
0

Are you teaching them theory of finite automata as well?

Here is a good one: parse the addresses of churches correctly from this badly structured format (copy and paste it as text first) http://www.churchangel.com/WEBNY/newhart.htm

Hamish Grubijan
  • 10,562
  • 23
  • 99
  • 147
0

I'm a fan of parsing date strings. Define a few common data formats, as well as time and date-time formats. These are often good exercises because some dates are simple mixes of digits and punctuation. There's a limited degree of freedom in parsing dates.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
0

Just to throw them for a loop, why not reword a question or two to suggest that they write a regular expression to generate data fitting a specific pattern like email addresses, phone numbers, etc.? It's the same thing as validating, but can help them get out of the mindset that regex is just for validation (whereas the data generation tool in visual studio uses regex to randomly generate data).

Mayo
  • 10,544
  • 6
  • 45
  • 90
0

Rather than teaching examples based from the data set, I would do examples from the perspective of the rule set to get basics across. Give them simple examples to solve that leads them to use ONE of several basic groupings in each solution. Then have a couple of "compound" regex's at the end.

Simple: s/abc/def/

Spinners and special characters: s/a\s*b/abc/

Grouping: s/[abc]/def/

Backreference: s/ab(c)/def$1/

Anchors: s/^fred/wilma/ s/$rubble/and betty/

Modifiers: s/Abcd/def/gi

After this, I would give a few examples illustrating the pitfalls of trying to match html tags or other strings that shouldn't be done with regex's to show the limitations.

SDGator
  • 2,027
  • 3
  • 21
  • 25
-1

Try to think of some tests that don't include ones that can be found with Google.

Asking a email validator should pose no trouble finding..

Try something like a 5 proof test.

Input 5 digit. Sum up each digit must be dividable by five: 12345 = 1+2+3+4+5 = 15 / 5 = 3(.0)

Ralf de Kleine
  • 11,464
  • 5
  • 45
  • 87
  • I'd like to see that regex, if you don't mind. How do you check if the *sum* is dividable by 5? – Kobi Dec 30 '09 at 06:42
  • Regex does not count. Numerical values have no meaning in regex: they're just strings, just like 'a', 'b' and 'c'. – Bart Kiers Dec 30 '09 at 06:48