2

I have a string and want to check if it can be used as a valid variable without getting a syntax error. For example

def variableName(string):
    #if string is valid variable name:
        #return True
    #else:
        #return False

input >>> variableName("validVariable")
output >>> True
input >>> variableName("992variable")
output >>> False

I would not like to use the .isidentifier(). I want to make a function of my own.

anthony sottile
  • 61,815
  • 15
  • 148
  • 207

3 Answers3

7

The following answer is true only for "old-style" Python-2.7 identifiers;

"validVariable".isidentifier()
#True
"992variable".isidentifier()
#False

Since you changed your question after I posted the answer, consider writing a regular expression:

re.match(r"[_a-z]\w*$", yourstring,flags=re.I)
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • OP says:"I would not like to use the .isidentifier(). I want to make a function of my own." So your solution isn't answeing the question, i think. forgive me if im wrong. –  Mar 17 '18 at 02:05
  • Yes. @BOi is correct. your answer does not answer my question. I do not want to use .isidentifier() i want to create my own function – Radhe Krishna Mar 17 '18 at 02:08
  • @RadheKrishna You changed your question after I posted my answer. Consider using regular expressions, then. (I modified the answer.) – DYZ Mar 17 '18 at 02:09
  • @DyZ thqanks a lot – Radhe Krishna Mar 17 '18 at 02:11
  • 1
    You can simplify the second part of your regular expression to `\\w` which matches letters, digits, and `_`. – Uyghur Lives Matter Mar 17 '18 at 02:14
  • `str.isidentifier()` and the regex expression aren't equivalent by any means. – Ashwini Chaudhary Mar 17 '18 at 03:56
  • Why aren't they? – DYZ Mar 17 '18 at 05:09
  • 1
    @DyZ `>>> Ä = 1 >>> print(Ä) 1` (python3 extends identifiers to a bunch of non-ascii characters) – anthony sottile Mar 17 '18 at 05:50
  • I think you got things backward in your edit; [`'Ä'.isidentifier()` returns True](https://ideone.com/4upGaQ) on Python 3. The regex is the one that fails on Python 3. Also, you may want to use [`keyword.iskeyword()`](https://docs.python.org/3/library/keyword.html) to exclude keywords. – user2357112 Mar 17 '18 at 06:52
5

In Python 3 a valid identifier can have characters outside of ASCII range, as you don't want to use str.isidentifier, you can write your own version of it in Python.

Its specification can be found here: https://www.python.org/dev/peps/pep-3131/#specification-of-language-changes

Implementation:

import keyword
import re
import unicodedata


def is_other_id_start(char):
    """
    Item belongs to Other_ID_Start in
    http://unicode.org/Public/UNIDATA/PropList.txt
    """
    return bool(re.match(r'[\u1885-\u1886\u2118\u212E\u309B-\u309C]', char))


def is_other_id_continue(char):
    """
    Item belongs to Other_ID_Continue in
    http://unicode.org/Public/UNIDATA/PropList.txt
    """
    return bool(re.match(r'[\u00B7\u0387\u1369-\u1371\u19DA]', char))


def is_xid_start(char):

    # ID_Start is defined as all characters having one of
    # the general categories uppercase letters(Lu), lowercase
    # letters(Ll), titlecase letters(Lt), modifier letters(Lm),
    # other letters(Lo), letter numbers(Nl), the underscore, and
    # characters carrying the Other_ID_Start property. XID_Start
    # then closes this set under normalization, by removing all
    # characters whose NFKC normalization is not of the form
    # ID_Start ID_Continue * anymore.

    category = unicodedata.category(char)
    return (
        category in {'Lu', 'Ll', 'Lt', 'Lm', 'Lo', 'Nl'} or
        is_other_id_start(char)
    )


def is_xid_continue(char):
    # ID_Continue is defined as all characters in ID_Start, plus
    # nonspacing marks (Mn), spacing combining marks (Mc), decimal
    # number (Nd), connector punctuations (Pc), and characters
    # carryig the Other_ID_Continue property. Again, XID_Continue
    # closes this set under NFKC-normalization; it also adds U+00B7
    # to support Catalan.

    category = unicodedata.category(char)
    return (
        is_xid_start(char) or
        category in {'Mn', 'Mc', 'Nd', 'Pc'} or
        is_other_id_continue(char)
    )


def is_valid_identifier(name):
    # All identifiers are converted into the normal form NFKC
    # while parsing; comparison of identifiers is based on NFKC.
    name = unicodedata.normalize(
        'NFKC', name
    )

    # check if it's a keyword
    if keyword.iskeyword(name):
        return False

    # The identifier syntax is <XID_Start> <XID_Continue>*.
    if not (is_xid_start(name[0]) or name[0] == '_'):
        return False

    return all(is_xid_continue(char) for char in name[1:])

if __name__ == '__main__':
    # From goo.gl/pvpYg6
    assert is_valid_identifier("a") is True
    assert is_valid_identifier("Z") is True
    assert is_valid_identifier("_") is True
    assert is_valid_identifier("b0") is True
    assert is_valid_identifier("bc") is True
    assert is_valid_identifier("b_") is True
    assert is_valid_identifier("µ") is True
    assert is_valid_identifier("") is True

    assert is_valid_identifier(" ") is False
    assert is_valid_identifier("[") is False
    assert is_valid_identifier("©") is False
    assert is_valid_identifier("0") is False

You can check CPython and Pypy's implmentation here and here respectively.

Community
  • 1
  • 1
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
0

You could use a regular expression.

For example:

isValidIdentifier = re.match("[A-Za-z_](0-9A-Za-z_)*",identifier)

Note that his only checks for alphanumeric characters. The actual standard supports other characters. See here: https://www.python.org/dev/peps/pep-3131/

You may also need to exclude reserved words such as def, True, False, ... see here: https://www.programiz.com/python-programming/keywords-identifier

Alain T.
  • 40,517
  • 4
  • 31
  • 51