0

I had a text string as following:

text = "907525191737280e , hjjhkj789jkh 2554nagy289 2 8 2 2 7 5 2 working welcome , a dp83640as25 , dp83867 e2 e25"

I tried using the following regex expression(from:regex for alphanumeric only is not working) to identify the alphanumeric words only.

and changed it as following: ^(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9]* But i didn't get the result that i wanted and I also tried [a-zA-Z0-9]+ but did it also got failed.

Desired output:

907525191737280e hjjhkj789jkh 2554nagy289 dp83640as25 dp83867 e2 e25

I am new to regex and trying to learn it. could you please help what am I missing?

Vas
  • 918
  • 1
  • 6
  • 19

3 Answers3

2

One option is to check for a digit using a lookahead and match at least a single char a-zA-Z.

You don't need the anchor ^ because that asserts the start of the string. You might use a word bounary \b to make sure the match is not part of a larger word.

\b(?=[a-zA-Z0-9]*[0-9])[a-zA-Z0-9]*[a-zA-Z][a-zA-Z0-9]*\b

In parts

  • \b Word boundary
  • (?=[a-zA-Z0-9]*[0-9]) Positive lookahead, assert a digit
  • [a-zA-Z0-9]*[a-zA-Z][a-zA-Z0-9]* Match a char a-zA-Z between all allowed chars
  • \b Word boundary

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
0

If you simply need all words which contain at least one numeric character and at least one alpha character, this might be done with using import string rather than import re following way:

import string
text = "907525191737280e , hjjhkj789jkh 2554nagy289 2 8 2 2 7 5 2 working welcome , a dp83640as25 , dp83867 e2 e25"
words = text.split()
anwords = [w for w in words if set(w).intersection(string.ascii_letters) and set(w).intersection(string.digits)]
print(anwords)  # ['907525191737280e', 'hjjhkj789jkh', '2554nagy289', 'dp83640as25', 'dp83867', 'e2', 'e25']

Note that this solution, similar to pattern you recognizes only ASCII letters as alphabetic. Remember that re is useful module, but some tasks are easier to get done another way.

Daweo
  • 31,313
  • 3
  • 12
  • 25
0

Just saying - no regex needed, really:

text = "907525191737280e , hjjhkj789jkh 2554nagy289 2 8 2 2 7 5 2 working welcome , a dp83640as25 , dp83867 e2 e25"

alnums = [word
          for word in text.split()
          if word.isalnum()]

print(alnums)

This yields

['907525191737280e', 'hjjhkj789jkh', '2554nagy289', '2', '8', '2', '2', '7', '5', '2', 'working', 'welcome', 'a', 'dp83640as25', 'dp83867', 'e2', 'e25']


Add other conditions if needed (e.g. the length):
alnums = [word
          for word in text.split()
          if word.isalnum() and len(word) > 1]

Which would yield

['907525191737280e', 'hjjhkj789jkh', '2554nagy289', 'working', 'welcome', 'dp83640as25', 'dp83867', 'e2', 'e25']
Jan
  • 42,290
  • 8
  • 54
  • 79