python: regex match string and only string

Question

I'm trying to create a small, text restriction program in python. Basically, the user inputs the text, some filters (a for alphabetic, n for numeric, etc.). The user can combine filters (a and n for alpha-numeric, etc.) but I stumbled upon this:

if re.match("[a-zA-Z]", textToRestrict):
    return True
else:
    return False

Here's where things fall apart. Supposedly, with only alphabetic as a filter, the program will only accept strings, such as, say, dance. However, the if statement still returns true if the textToRestric was, say, dance1234 (incorrect) while 1234dance will return false (correct).

Conversely, if I test for digits via [0-9], it would still return true even if it contains alphabetic characters, provided that the characters aren't the first.

How do I use regex to match only a certain type, and in such a way that adding another type to it (like type string + type digit) allows for both types return true?

UPDATE: This is the approach I used for multiple filters:

regex = ""
if FilterClass.ALPHABETIC in arguments:
    regex += "[a-zA-Z]"
if FilterClass.CAPITAL_ALPHABETIC in arguments:
    regex += "[A-Z]"
if FilterClass.NUMERIC in arguments:
    regex += "\d"
if FilterClass.SPECIAL_CHARACTERS in arguments:
    regex += "[^0-9a-zA-Z]*"
if FilterClass.DASH_UNDERSCORES in arguments:
    regex += "[-_]*"            

regall = "^(" + regex + ")+$"

if re.match(regall, textToRestrict):
    return True
else:
    return False

arguments is a parameter inputted in by the user. The if statements check what's in there, and, supposedly, add more patterns to the regex string.

your regex doesn't positionally check where in the string it is at, and doesn't match against anything more than the first character `^[a-zA-Z]+$` matches from the start to the end, on one or more matching occurrences of alphabetics until the end of the string. — Mike McMahon, Dec 01 '15 at 20:22
@SirParselot, because ideally, the user can input alpha, digit, special characters, or any combination of those, for as long as they use the proper filter. I'm not sure `isalpha()` and `isdigit()` could be used for that. — zack_falcon, Dec 01 '15 at 20:26

score 5 · Accepted Answer · edited May 23 '17 at 12:32

5

Add anchors at both ends of the regex, plus a quantifier (+ if you want to exclude the empty string; * if you want to permit the empty string). Right now, you're just checking to see whether the first character (singular) is alphabetic (i.e. matches [a-zA-Z]).

What you want is:

re.match("^[a-zA-Z]+$", textToRestrict)

(Or, if your filters are really this simple, consider using string methods like str.isalpha instead, as SirParselot suggests in a comment.)

edited May 23 '17 at 12:32

Community

1
1

answered Dec 01 '15 at 20:21

senshin

10,022
7
46
59

the left anchor isn't needed if you're testing with `match()` though. – Felk Dec 01 '15 at 20:22
3

@Felk Strictly speaking, you're right - it isn't. I'd prefer to have it in there anyway for explicitness's sake. – senshin Dec 01 '15 at 20:23
That worked for one, thanks. If treated as a string, can I just add on to that pattern, or do I need to have the `+$` at the very end? I've updated my post above to show how I currently do things for clarity. – zack_falcon Dec 01 '15 at 20:29
@zack_falcon The `+` means "match the thing immediately before me 1 or more times". The `$` means "match the end of the string". If you want to match alphanumerics, you'd need something like `^[0-9a-zA-Z]+$` if you want to use regexes. If you want to build these regexes piece by piece, you'd want to use an alternation like `[0-9]|[a-zA-Z]` and then wrap it in `^(...)+$`. – senshin Dec 01 '15 at 20:34
Isn't the pipe an or? Would that mean it would check if there's a `[0-9]` or a `[a-zA-Z]` until the end? – zack_falcon Dec 01 '15 at 20:49
@zack_falcon Yes. See for yourself: https://regex101.com/r/bJ1rK2/1 – senshin Dec 01 '15 at 20:51
I see. It went back to square one (accepting texts that match only one type), but according to that link (thanks, btw), removing the `|` treats it as an `and`. I've updated my code above, though for some reason, it now returns a negative whenever I combine anything. – zack_falcon Dec 01 '15 at 20:59
@zack_falcon I think you should find a tutorial on regexes and read it. When you remove the alternation pipe `|`, you get `^([0-9][a-zA-Z])+$`, which is an "and" only in the loosest sense of the word. This means "match strings that consists of 2-character numeric-followed-by-alphabetic subunits, repeated 1 or more times", which is almost surely not you want (matches `1a2b3c` but not `123abc`, e.g.). – senshin Dec 01 '15 at 21:01
Yes, sorry, my knowledge on regex is basic. The regex with the `|` going by your sample, also matches the sample texts `12345` and `abcdefg`, which isn't what the user would ask for if they input both alphabetic and numeric as filters, so I'm not sure what else to put in. Then there's also special characters, spaces, etc. to deal with, as filters. – zack_falcon Dec 01 '15 at 21:12
@zack_falcon So you're saying that if the user picks the "numeric" and "alphabetic" filters, they would never enter purely numeric or purely alphabetic text? That seems like a bad assumption. If you want to enforce that anyway, you will either have to create some very convoluted regexes, or do some additional input validation outside of your regexes. I recommend the latter. – senshin Dec 01 '15 at 21:14
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/96742/discussion-between-zack-falcon-and-senshin). – zack_falcon Dec 01 '15 at 23:28

python: regex match string and only string

1 Answers1