10

Update:

This question was an epic failure, but here's the working solution. It's based on Gumbo's answer (Gumbo's was close to working so I chose it as the accepted answer):

Solution:

r'(?=[a-zA-Z0-9\-]{4,25}$)^[a-zA-Z0-9]+(\-[a-zA-Z0-9]+)*$'

Original Question (albeit, after 3 edits)

I'm using Python and I'm not trying to extract the value, but rather test to make sure it fits the pattern.

allowed values:

spam123-spam-eggs-eggs1
spam123-eggs123
spam
1234
eggs123

Not allowed values:

eggs1-
-spam123
spam--spam

I just can't have a dash at the starting or the end. There is a question on here that works in the opposite direction by getting the string value after the fact, but I simply need to test for the value so that I can disallow it. Also, it can be a maximum of 25 chars long, but a minimum of 4 chars long. Also, no 2 dashes can touch each other.

Here's what I've come up with after some experimentation with lookbehind, etc:

# Nothing here
Community
  • 1
  • 1
orokusaki
  • 55,146
  • 59
  • 179
  • 257
  • 1
    Would you mind actually completing your answer before posting it? It's impolite to keep adding constraints (minimum 4 characters, max 25) after people start answering your question. – Seth Johnson Mar 26 '10 at 17:32
  • 1
    You mention a minimum of 4 characters, but in your example you include "123" as an allowed value. Should that be in the not allowed column? – Daniel Stutzbach Mar 26 '10 at 18:06
  • Nowhere in your description does it say that you only want to allow letters, numbers and dashes. Furthermore you kept changing the question. How is anyone supposed to answer this without getting a downvote? – synic Mar 26 '10 at 18:19
  • 2
    The additional `[a-zA-Z0-9]+` at the end is not necessary; `(\-[a-zA-Z0-9]+)*` is already covering that. – Gumbo Mar 26 '10 at 19:06
  • @Gumbo Thanks, I misinterpreted that part, but now I'm reading it as (any dash proceeded by `alnum`, and zero or more repetitions of this pattern). In fact, not only was it not necessary, but it actually worked incorrectly. If the string `i-am-string-number-5` was searched against that re, it would return `None`, because only enough was there to be eaten by the hyphen pattern of the regex. Thanks for all your help man. I've edited my solution. – orokusaki Mar 29 '10 at 02:08

4 Answers4

17

Try this regular expression:

^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$

This regular expression does only allow hyphens to separate sequences of one or more characters of [a-zA-Z0-9].


Edit    Following up your comment: The expression (…)* allows the part inside the group to be repeated zero or more times. That means

a(bc)*

is the same as

a|abc|abcbc|abcbcbc|abcbcbcbc|…

Edit    Now that you changed the requirements: As you probably don’t want to restrict each hyphen separated part of the words in its length, you will need a look-ahead assertion to take the length into account:

(?=[a-zA-Z0-9-]{4,25}$)^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 24 seconds faster than me! Aside: you disallow sequential dashes, and ignore the {4,25} length restrictions requested by OP. (Which I also missed upon first reading of the question...) – ephemient Mar 26 '10 at 17:29
  • @orokusaki: The `*` quantifier allows the part inside the group `(…)` to be repeated zero or more times. That means no repetition is also possible. – Gumbo Mar 26 '10 at 17:33
  • @ephemient: You didn't miss them, the OP added them later. And has kept adding stuff (no consecutive dashes). – Seth Johnson Mar 26 '10 at 17:40
  • @orokusaki: you started out with "anything made with letters or dashes, except the start or end can't be dashes". Then you added the `{4,25}` requirement. Then you added "no two consecutive dashes". None of your initial examples showed your additions. – Seth Johnson Mar 26 '10 at 17:57
  • @Gumbo Thanks for taking the time to edit after I changed everything. The only issues with your solution is 1) that it doesn't mention the hyphen in your lookahead and 2) In the pattern, you didn't escape the hyphen (which is a special char), but I've posted a solution in my question based on your answer. – orokusaki Mar 26 '10 at 18:52
  • 1
    @orokusaki: Ah you’re right, thanks! But the hyphen does not need to be escaped if used a the start or the end of a character class and outside of character classes not at all. – Gumbo Mar 26 '10 at 18:58
  • @Gumbo Thanks. I didn't know that bit about hyphens. – orokusaki Mar 29 '10 at 02:03
  • @Gumbo One more thing: Is it OK to still escape the hyphen or is it a bad practice (for me it felt more conventional but I don't know if there are implications). – orokusaki Mar 29 '10 at 02:17
  • @orokusaki: It’s semantically irrelevant. But it’s making the regular expression a little less readable. And since you’re using Python … ;-) – Gumbo Mar 30 '10 at 08:11
4

The current regex is simple and fairly readable. Rather than making it long and complicated, have you considered applying the other constraints with normal Python string processing tools?

import re

def fits_pattern(string):
    if (4 <= len(string) <= 25 and
        "--" not in string and
        not string.startswith("-") and
        not string.endswith("-")):

        return re.match(r"[a-zA-Z0-9\-]", string)
    else:
        return None
Mike Graham
  • 73,987
  • 14
  • 101
  • 130
  • 1
    That might have gone a bit overboard with the not-putting-it-in-the-regex, but the general idea is worth considering. As the old adage goes: *Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.* – Mike Graham Mar 26 '10 at 18:16
2

It should be something like this:

^[a-zA-Z0-9]+(-[a-zA-Z0-9]+)*$

You are telling it to look for only one char, either a-z, A-Z, 0-9 or -, that is what [] does.

So if you do [abc] you will match only "a", or "b" or "c". not "abc"

Have fun.

jpabluz
  • 1,200
  • 6
  • 16
  • @jpabluz I only put the regex in the title to show the allowed chars. I'm going to use + or * of course, but I wanted to demonstrate which chars are allowed. – orokusaki Mar 26 '10 at 17:33
0

If you simply don't want a dash at the end and beginning, try ^[^-].*?[^-]$

Edit: Bah, you keep changing it.

synic
  • 26,359
  • 20
  • 111
  • 149