1

Below I have the following regex:

alphanumeric = compile('^[\w\d ]+$')

I'm running the current data against this regex:

Tomkiewicz Zigomalas Andrade Mcwalters 

I have a separate regex to identify alpha characters only, yet the data above still matches the alphanumeric criteria.

Edit: How do I stop the only alpha data matching with the regex above?

Glitchezz
  • 353
  • 2
  • 7
  • 21
  • A character class in brackets matches _any_ of the expressions. `[\w\d]` means "either a number, or a letter." If you only want letters, remove the `\d`. – g.d.d.c Apr 10 '14 at 19:20
  • I want letters and numbers. I want it to only match against alphanumeric data, yet it matches against the data which is only alpha characters. – Glitchezz Apr 10 '14 at 19:21
  • @g.d.d.c `\w` includes numbers as well :-) – hjpotter92 Apr 10 '14 at 19:21
  • 1
    So "alphanumeric data" must contain at least one digit and at least one letter? ex. "1a" is alphanumeric, but "1" isn't and "a" isn't? – Kevin Apr 10 '14 at 19:23
  • @Kevin Yes, that's correct. – Glitchezz Apr 10 '14 at 19:24

2 Answers2

3

Description: It can be in two forms:

  1. Starts with numeric chars then there should be some chars, followed by any number of alpha-numeric chars are possible.
  2. Starts with alphabets, then some numbers, followed by any number of alpha-numeric chars are possible.

Demo:

>>> an_re = r"(\d+[A-Z])|([A-Z]+\d)[\dA-Z]*"
>>> re.search(an_re, '12345', re.I) # not acceptable string
>>> re.search(an_re, 'abcd', re.I) # not acceptable string 
>>> re.search(an_re, 'abc1', re.I) # acceptable string 
<_sre.SRE_Match object at 0x14153e8>
>>> re.search(an_re, '1abc', re.I)
<_sre.SRE_Match object at 0x14153e8>
Grijesh Chauhan
  • 57,103
  • 20
  • 141
  • 208
  • @Glitchezz It will work, with `I`, But understood my RE? And remember I didn't use `\w` as it includes `_` And I left `^` and `$` for you as an exercise – Grijesh Chauhan Apr 10 '14 at 19:37
1

Use a lookahead to assert the condition that at least one alpha and at least one digit are present:

(?=.*[a-zA-Z])(?=.*[0-9])^[\w\d ]+$

The above RegEx utilizes two lookaheads to first check the entire string for each condition. The lookaheads search up until a single character in the specified range is found. If the assertion matches then it moves on to the next one. The last part I borrowed from the OP's original attempt and just ensures that the entire string is composed of one or more lower/upper alphas, underscores, digits, or spaces.

tenub
  • 3,386
  • 1
  • 16
  • 25
  • Tenub I tried to understand your RE but couldn't, can you explain a bit for me? (though I have bookmarked this for tomorrow lesson) +1 for advance RE example. Thanks. – Grijesh Chauhan Apr 10 '14 at 19:57