1

I am trying to extract an alphanumeric sequence from text. This sequence could be either five or six characters in length, must start and end with a number, and have at least one letter in between, for example: 25D212, 4WX07, 8FZW5, 2T784, 25XR47

This is what I was able to put together

[0-9][[0-9]|[a-zA-Z]]{3,4}[0-9]

The issue with this solution is that it also matches

888888 (at least one character constraint not being met)

Pshemo
  • 122,468
  • 25
  • 185
  • 269
newbie
  • 21
  • 3

3 Answers3

1

generalized
Based on the permutations (below), it looks like it can be generalized to this

 # (?i)\d(?=\d{0,3}[a-z])[a-z\d]{3,4}\d

 (?i)
 \d                   # A digit
 (?= \d{0,3} [a-z] )  # a letter in the next 1 to 4 characters
 [a-z\d]{3,4}         # 3 to 4 digits or letters
 \d                   # A digit

permutations

 # (?i)\d(?:[a-z][a-z\d]{2,3}|\d[a-z][a-z\d]{1,2}|\d\d[a-z][a-z\d]{0,1}|\d\d\d[a-z])\d

 (?i)
 \d 
 (?:
      [a-z] 
      [a-z\d]{2,3} 
   |  
      \d 
      [a-z] 
      [a-z\d]{1,2} 
   |  
      \d\d 
      [a-z] 
      [a-z\d]{0,1} 
   |  
      \d\d\d 
      [a-z] 
 )
 \d

Input

for example: 25D212, 4WX07, 8FZW5, 2T784, 25XR47

Output

 **  Grp 0 -  ( pos 13 , len 6 ) 
25D212  

 **  Grp 0 -  ( pos 21 , len 5 ) 
4WX07  

 **  Grp 0 -  ( pos 28 , len 5 ) 
8FZW5  

 **  Grp 0 -  ( pos 28 , len 5 ) 
8FZW5  

 **  Grp 0 -  ( pos 42 , len 6 ) 
25XR47  
0

You may be able solve this with one regex pattern, but you can definately solve it with two patterns.

First pattern will be something like:

\d[0-9a-zA-Z]{3,4}\d

(Note: \d is the same as [0-9])

Second pattern will be this:

\d+[a-zA-Z]+\d+

The first pattern controls the size of the string, the second confirms that it contains at least one alpha character.

Good resource for testing Java regex patterns: http://www.regexplanet.com/advanced/java/index.html

Looks like the Avinash Raj post is the correct answer. I'm leaving mine as an option (suboptimal as it may be)

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
DwB
  • 37,124
  • 11
  • 56
  • 82
-1

@Avinash Raj already has a good-looking pattern, but I wanted to share my alternative:

\b(?=[0-9a-zA-Z]{5,6}\b)\d.*[a-zA-Z].*\d\b

This may be easier for you to read if you split it apart into the look-ahead assertion that tests for a 5-6 character string of alphanumerics, and the actual pattern that tests the format (of starting and ending with a digit and having a letter somewhere in the middle). This roughly corresponds to how you phrased the question initially, so it may be easier to read and remember going forward "what the heck does this do?"

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
dcsohl
  • 7,186
  • 1
  • 26
  • 44
  • The main match will overmatch due to `.*` – nhahtdh Jan 12 '15 at 19:32
  • @nhahtdh It actually won't. Did you even try it? The lookahead assertion validates that all characters are alpha-numeric. This pattern *does not match*, e.g., `1#,W89`. – dcsohl Jan 12 '15 at 20:38
  • https://regex101.com/r/iK3iE4/1. Failed on `02344 sf4` Once you manage to pass assertion, all hells break loose. – nhahtdh Jan 12 '15 at 21:09