26

The regular definition for recognizing identifiers in C programming language is given by

letter -> a|b|...z|A|B|...|Z|_
digit -> 0|1|...|9
identifier -> letter(letter|digit)*

This definition will generate identifiers of the form

identifier: [_a-zA-Z][_a-zA-Z0-9]*

My question now is how do you limit the length of the identifier that can be generated to not more than 31 characters. What changes need to be made in the regular definition or how to write a regular expression to limit it to not more than the specified length. Could anyone please help. Thanks.

Oscar Mederos
  • 29,016
  • 22
  • 84
  • 124
Jeris
  • 2,355
  • 4
  • 26
  • 38
  • Side note, the original regex can be shortened by using negative lookahead and predefined character classes `(?!\d)\w*` – darw Oct 09 '21 at 10:56

2 Answers2

38

The regular expression you are looking for is:

[_a-zA-Z][_a-zA-Z0-9]{0,30}

It will match an underscore or letter following by X underscores, letters or numbers, where 0 <= X <= 30

andyroberts
  • 3,458
  • 2
  • 37
  • 40
Oscar Mederos
  • 29,016
  • 22
  • 84
  • 124
  • I got it the moment the other two users gave their suggestions...thanks anyways. – Jeris Feb 19 '13 at 09:39
  • @jerisalan ok. just placed my question since you asked on both answers "any possible way to change the regular definition to bring about the same change". – Oscar Mederos Feb 19 '13 at 09:39
  • 1
    Here {0,30} only restricts the length on ```[_a-zA-Z0-9]```. The above regex means that 1 character from ```[_a-zA-Z]``` and atmost 30 characters from ```[_a-zA-Z0-9]``` – Ojasv singh Feb 24 '21 at 11:11
0

Update: Updated regex such that identifier is not started with a digit.

To limit the length, {} are usually used.
For example, your regex was [_a-zA-Z0-9]+. Means, allow any alphanumeric values and underscore, and the length must be greater than equals to 1. If we want to limit it not to exceed 31 characters, we can rewrite the regex as:

[_a-zA-Z0-9]{1,31}

{1,31} indicates that this will accept alphanumeric values of length greater than equals to 1 and less than equals to 31.

However, the above regex also means that the identifier can start with a digit. Note that there are three ranges provided: a-z, A-Z, and 0-9. To limit the identifier to start with an alphabet or underscore followed by alphabet, digit or underscore, following regex can be used:

[_a-zA-Z][_a-zA-Z0-9]{0,30}

The first portion [_a-zA-Z] forces the identifier to start with a character or underscore. It also makes sure that the identifier is not empty. The remaining portion of the regex [_a-zA-Z0-9]{0-30} ensures that only characters, underscore and digits are accepted and that in addition to the first character, up to 30 more can be added to the identifier.

You can make respective changes to your regex.

Ali Shah Ahmed
  • 3,263
  • 3
  • 24
  • 22