4

I have some data that comes in as a String, and I need to extract or print out the monthvalue ( middle group) that is in the form:

[itemvalue] [monthvalue] [yearvalue]

The rules are:

itemvalue = can be 1-3 characters (or digits) in length

monthvalue = is single alpha character [a-z]

yearvalue = can be 1, 2, or 4 digits representing calender year

Some Example Inputs:

Input1

AP18

Output1

P

Input2

QZAB19

Output2

B

Input3

ARM8

Output3

M

I was trying to compile a pattern like:

Pattern pattern = Pattern.compile("([a-zA-Z0-9]{1,3})([a-z])([0-9]{1,4})");

and then call matcher on the input to find() the groups, in this case, the monthvalue, which should be matcher.group(2) like:

Matcher m = pattern.matcher("OneOfTheExampleInputStringsFromAbove"); 

    if (matcher.find()) {
    System.out.println(matcher.group(2));
}

I thought I was close but one issue was how to include a length of 1, 2 and 4, but exclude 3 length for the yearvalue. Is my approach good? Am I missing anything in my Compile pattern?

please let me know!

VLAZ
  • 26,331
  • 9
  • 49
  • 67
ennth
  • 1,698
  • 5
  • 31
  • 63
  • Q: Will every item code always include all three parts: itemValue, monthValue and yearValue? Q: What's the rule for determining how long itemValue is: whether it's one, two or three characters? – FoggyDay Jun 08 '20 at 04:36
  • Yes, each input will have three parts. There is no rule for determining itemvalue length, it can just be 1 to 3 characters or digits at random. That's why I thought regex was the best approach. – ennth Jun 08 '20 at 04:43
  • You can use or condition to exclude length 3. – jnrdn0011 Jun 08 '20 at 04:45
  • @ennth; if any of the below answer helps; please accept the one which best suits you and close this post. It'll help the answerer as well as the future readers of this post. –  Jun 17 '20 at 08:30

4 Answers4

2

Try this:

([\w]{1,3})(\D)([\d]{1,4})

Examples:

https://www.freeformatter.com/java-regex-tester.html#ad-output

Input     Match:
-----     -----
AP18      (A)(P)(18)
QZAB19    (QZA)(B)(19)
ARM8      (AR)(M)(8)
QZAB123   (QZA)(B)(123)
QZAB1234  (QZA)(B)(1234)
A123      No match
1234      No match
FoggyDay
  • 11,962
  • 4
  • 34
  • 48
  • but would''t \d {1,4} on the yearvalue pick up a length of 3? that violates the yearvalue rule. – ennth Jun 08 '20 at 05:03
  • The regex will do it's job - it'll extract the values just fine. If you also want validate potentially bad input... then I'd recommend doing checking for *all* possible errors in Java, after you've extracted the values. If you *really* want to exclude "3", [mandy8055](https://stackoverflow.com/a/62255317/3135317) showed you how to combine "or" with a ["non-capture subpattern"](https://stackoverflow.com/questions/3705842/). Please consider "upvoting" and "accepting" his (most excellent!) reply. – FoggyDay Jun 08 '20 at 17:27
2

Your regex is correct. To add your last requirement you may try:

^\w{1,3}([a-zA-Z])(?:\d{1,2}|\d{4})$
                   ^^^^^^^^^^^^^^^^
                    This part

Explanation of the above regex:

^, $ - Represents start and end of line respectively.

\w{1,3} - Matches from [0-9A-Za-z_] 1 to 3 times. If there is a chance that your test string contains _; then try to use [0-9A-Za-z] here.

([a-zA-Z]) - Represents capturing group matching a letter.

(?:\d{1,2}|\d{4}) - Represents a non-capturing group matching the digits 1, 2 or 4 times but not three.

You can find the above regex demo in here.

pictorial Representation

Implementation in java:

import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Main
{
    private static final Pattern pattern = Pattern.compile("^\\w{1,3}([a-zA-Z])(?:\\d{1,2}|\\d{4})$", Pattern.MULTILINE);
    public static void main(String[] args) {
        final String string = "QZAB19\n"
     + "AP18\n"
     + "ARM8\n"
     + "ARM803"; // This won't match since the year value is 3.
     Matcher matcher = pattern.matcher(string);
     while(matcher.find())System.out.println(matcher.group(1)); // 1st group matches the month-value.
    }
}

You can find the sample run of the above code in here.

Community
  • 1
  • 1
1

If you looking something different than a regex solution then the below could help:

String txt = "QZAB19";
String month = txt.replaceAll("[0-9]", ""); //replaces all integers
System.out.println(month.charAt(month.length()-1)); //get you the last character that is month 

Output:

B
Vishwa Ratna
  • 5,567
  • 5
  • 33
  • 55
1
Pattern pattern = Pattern.compile("^([a-zA-Z0-9]{1,3})([a-zA-Z])(([0-9]{1,2})|([0-9]{4}))$");

You should use $ to restrict the end matching point else your condition for restricting digts at end of string doesn't work.

jnrdn0011
  • 417
  • 2
  • 13
  • Ah, okay, this is what I was getting to also. What exactly do you mean by putting the $ at the end? And everytime I see people put the $ at the end, they also use ^ at the front, how is this different from not using ^...$ if I may ask? Thank you. – ennth Jun 08 '20 at 05:01
  • Yes, you have to use ^ also to strict the start matching point. I am suggesting this because I can see that you are trying to match the whole input string, but if you want to just pick the matching part from a large string or file then you don't need to use ^ and $. – jnrdn0011 Jun 08 '20 at 05:21