9

I'm currently writing validation code for a tool parameter in ArcMap 10 (updateMessages) and need to prevent users from using non-alphanumeric characters within a string as it will be used to name a newly created field in a feature class.

I have so far used 'str.isalnum()' however this of course excludes underscores. Is there an efficient way to only accept alphanumeric characters and underscores?

if self.params[3].altered:
  #Check if field name already exists
  if str(self.params[3].value) in [f.name for f in arcpy.ListFields(str(self.params[0].value))]:
    self.params[3].setErrorMessage("A field with this name already exists in the data set.")
  #Check for invalid characters
  elif not str(self.params[3].value).isalnum():
    self.params[3].setErrorMessage("There are invalid characters in the field name.")   
  else:
    self.params[3].clearMessage()

return
Ankur Agarwal
  • 23,692
  • 41
  • 137
  • 208
Howeitzer
  • 220
  • 1
  • 2
  • 7

4 Answers4

11

Try regular expressions:

import re
if re.match(r'^[A-Za-z0-9_]+$', text):
    # do stuff
bwind
  • 705
  • 5
  • 7
3
import re
if re.match(r'^\w+$', text):
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • PEP8 — one statement in one line -> `\npass` :) – Peter Varo Jun 07 '13 at 11:16
  • 1
    @PeterVaro if it suits your fancy ;) I use one line `if`s for very simple statements though – jamylak Jun 07 '13 at 11:17
  • Thanks! BTW it's not about fanciness -> if these "example" and "answer" codes here, on StackOverflow, will use the *"good code format"*, then we will teach these conventions to anyone, who is starting programming, or not familiar with python. And actually, our job is going to be easier: reading a well-formatted code is always a pleasure. – Peter Varo Jun 07 '13 at 11:21
  • @PeterVaro Of course, but also note that PEP-8 is like the Pirate's code *"the Code is more what you'd call guidelines than actual rules."*, I follow it **almost fully** however sometimes I notice situations where a one line `if` is beneficial to the code. In this case I see you are correct to promote two lines though – jamylak Jun 07 '13 at 11:23
  • This will return a match even if there is a non-alphanumeric or underscore at the beginning. Shouldn't it be `re.match(r'^\w+$', text)` to ensure the _entire_ string doesn't contain invalid characters? That way you could do ```if not re.match(r'^\w+$', text): # handle invalid input ``` – Cameron Gagnon Feb 03 '17 at 02:55
  • 1
    @PaulKenjora Wrong. `\w When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore` https://docs.python.org/2/library/re.html – jamylak Jul 28 '17 at 02:27
  • @PaulKenjora `>>> bool(re.match(r'^\w+$', '-'))` `False` See, No match for a dash. – jamylak Jul 28 '17 at 02:28
3

An alternative way, without using regular expression for this specific case :

if text.replace('_', '').isalnum():
   # do stuff

You can also check for ascii characters only :

if text.replace('_', '').isalnum() and text.isascii():
   # do stuff
JB RNLT
  • 31
  • 2
0

If you are using Python3 and there are non-ASCII characters in your string it is better to compile the regex with 8-bit string setting.

import sys
import re

if sys.version_info >= (3, 0):
    _w = re.compile("^\w+$", re.A)
else:
    _w = re.compile("^\w+$")

if re.match(_w, text):
    pass

For more information please refer to here.

Lerner Zhang
  • 6,184
  • 2
  • 49
  • 66