0

Trying to extract substring after a particular code for example

String sample1 = "/ASDF/096/GHJKL/WER/WER/dv/7906/CODEM/TEAR1331927498xxxxxx/YUII/OPL";
String sample2 = "/CODEM/TEAR1331927498xxxxxx";

String regExpresssion = "[/CODEM/]{6}(^[a-zA-Z0-9|\\s])?";
final Pattern pattern = Pattern.compile(regExpresssion);
final Matcher matcher = pattern.matcher(sample1);
if (matcher.find()) {
  String subStringOut = sample1.substring(matcher.end());
}

subStringOut for sample 1  > TEAR1331927498xxxxxx/YUII/OPL
subStringOut for sample 2  > TEAR1331927498xxxxxx

above code is working fine but now I need to add one more identifier '/CODER/' in regex expression for below sample

String sample3 = "/ASDF/096/GHJKL/WER/WER/dv/7906/CODER/TEAR1331927498xxxxxx/YUII/OPL";

I have tried

String regExpresssion = "[/CODEM/|/CODER/]{6}(^[a-zA-Z0-9|\\s])?"; 

but it is not working. Any suggestions guys?

Thanks!!

Neeti
  • 377
  • 1
  • 4
  • 16
  • can you try this example to demonstrate the problem? https://regex101.com/r/UOWTje/1 – Maxim Shoustin May 21 '20 at 17:32
  • I am confused by your regex and what exactly it is meant to match. File paths? – Slackow May 21 '20 at 17:38
  • 1
    You seem to just need `String regExpresssion = "/CODE[MR]/"`, see [demo](https://regex101.com/r/6b8G1T/1). Your `[/CODEM/]{6}(^[a-zA-Z0-9|\s])?` regex is a mess and just wrong. It is equal to `[/CODEM]{6}` regex since `(^[a-zA-Z0-9|\s])?` never matches anything, there cannot be start of string after 6 specific chars in a string. You must be searching for a *sequence of characters*, and if it is so, you must remove `[` and `]{6}` from that pattern and remove all redundant parts. Sure, you may also use `/(CODEM|CODER)/`, but `/CODE[RM]/` is more conscise. – Wiktor Stribiżew May 21 '20 at 17:39

2 Answers2

0

try replacing [/CODEM/|/CODER/]{6} with /CODE[RM]/

I think you meant to match the entire phrase /CODEM/ or /CODER/ but because of the way you wrote it you were accepting any sequence of any of those characters 6 characters long. I'm not entirely sure though. The Brackets represent a "character class" and they only match a single character, if you want to match multiple in a row you use parentheses. Also the second part does not make sense to me because the exponent sign is in the middle of the phrase, and in that context it matches the beginning of a line.

Slackow
  • 153
  • 8
  • Thanks, Wiktor and Slackow. Regex /(CODEM|CODER)/ resolved my problem. Though /CODE[RM]/ worked but I may get a completely different phrase like RESET, for that /(CODEM|RESET)/ will do the thing. – Neeti May 21 '20 at 17:59
  • Niks, if you meant there to be words to match other than `"CODEM"` and `"CODER"` you should have said so in the question. Note that both expressions match `"ENCODER"`, so you may want to add word breaks. – Cary Swoveland May 21 '20 at 18:35
  • also something to keep in mind is that something like `/(CODE[RM]|RESET)/` works just fine as well – Slackow May 21 '20 at 19:00
0

Just need single look behind assersun
Try (?<=/CODE[MR]/).*

PCRE demo
but works for Java in this case