Only accept alphanumeric characters and underscores for a string in python

Question

I'm currently writing validation code for a tool parameter in ArcMap 10 (updateMessages) and need to prevent users from using non-alphanumeric characters within a string as it will be used to name a newly created field in a feature class.

I have so far used 'str.isalnum()' however this of course excludes underscores. Is there an efficient way to only accept alphanumeric characters and underscores?

if self.params[3].altered:
  #Check if field name already exists
  if str(self.params[3].value) in [f.name for f in arcpy.ListFields(str(self.params[0].value))]:
    self.params[3].setErrorMessage("A field with this name already exists in the data set.")
  #Check for invalid characters
  elif not str(self.params[3].value).isalnum():
    self.params[3].setErrorMessage("There are invalid characters in the field name.")   
  else:
    self.params[3].clearMessage()

return

score 11 · Answer 1 · answered Jun 07 '13 at 11:09

11

Try regular expressions:

import re
if re.match(r'^[A-Za-z0-9_]+$', text):
    # do stuff

answered Jun 07 '13 at 11:09

bwind

705
5
7

Right, missed the alpha part. :) – bwind Jun 07 '13 at 11:11
3

`re.match` matches from the start so `^` is redundant – jamylak Jun 07 '13 at 11:14
5

I believe it is good practice to be specific when it comes to regular expressions. This way the expression is portable, too. – bwind Jun 07 '13 at 11:19
then you should use `re.search` – jamylak Jun 07 '13 at 11:20

jamylak · Accepted Answer · 2017-06-08T13:40:30.443

3

import re
if re.match(r'^\w+$', text):

edited Jun 08 '17 at 13:40

answered Jun 07 '13 at 11:08

jamylak

128,818
30
231
230

PEP8 — one statement in one line -> `\npass` :) – Peter Varo Jun 07 '13 at 11:16
1

@PeterVaro if it suits your fancy ;) I use one line `if`s for very simple statements though – jamylak Jun 07 '13 at 11:17
Thanks! BTW it's not about fanciness -> if these "example" and "answer" codes here, on StackOverflow, will use the *"good code format"*, then we will teach these conventions to anyone, who is starting programming, or not familiar with python. And actually, our job is going to be easier: reading a well-formatted code is always a pleasure. – Peter Varo Jun 07 '13 at 11:21
@PeterVaro Of course, but also note that PEP-8 is like the Pirate's code *"the Code is more what you'd call guidelines than actual rules."*, I follow it **almost fully** however sometimes I notice situations where a one line `if` is beneficial to the code. In this case I see you are correct to promote two lines though – jamylak Jun 07 '13 at 11:23
This will return a match even if there is a non-alphanumeric or underscore at the beginning. Shouldn't it be `re.match(r'^\w+$', text)` to ensure the _entire_ string doesn't contain invalid characters? That way you could do ```if not re.match(r'^\w+$', text): # handle invalid input ``` – Cameron Gagnon Feb 03 '17 at 02:55
1

@PaulKenjora Wrong. `\w When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore` https://docs.python.org/2/library/re.html – jamylak Jul 28 '17 at 02:27
@PaulKenjora `>>> bool(re.match(r'^\w+$', '-'))` `False` See, No match for a dash. – jamylak Jul 28 '17 at 02:28

score 3 · Answer 3 · answered Mar 15 '21 at 16:44

3

An alternative way, without using regular expression for this specific case :

if text.replace('_', '').isalnum():
   # do stuff

You can also check for ascii characters only :

if text.replace('_', '').isalnum() and text.isascii():
   # do stuff

answered Mar 15 '21 at 16:44

JB RNLT

31
2

score 0 · Answer 4 · answered Oct 12 '17 at 05:34

If you are using Python3 and there are non-ASCII characters in your string it is better to compile the regex with 8-bit string setting.

import sys
import re

if sys.version_info >= (3, 0):
    _w = re.compile("^\w+$", re.A)
else:
    _w = re.compile("^\w+$")

if re.match(_w, text):
    pass

For more information please refer to here.

Only accept alphanumeric characters and underscores for a string in python

4 Answers4