0

I have the following program,

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.regex.Pattern;

public class Regex {
    public static void main(String[] args) {
        String VALID_GUID_REGEX = "[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}";
        Pattern NOT_PREFIXED_FILES_REGEX =
                Pattern.compile("(^"+VALID_GUID_REGEX+"/\\b(foo|bar)\\b.*)|^[^/]+$");

        List<String> list = new ArrayList<>();
        list.add("256a5037-9fc1-4e60-95c3-523d5ae1c935/foo/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a");
        list.add("256a5037-9fc1-4e60-95c3-523d5ae1c935/bar/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a");
        list.add("govcorp/123a5037-9fc1-4e60-95c3-523d5ae1c935/foo/text.doc");
        list.add("156a5037-9fc1-4e60-95c3-523d5ae1c935/123a5037-9fc1-4e60-95c3-523d5ae1c935/delta/text.doc");
        list.add("123a5037-9fc1-4e60-95c3-523d5ae1c935/");

        String[] keys = list.stream()
                .filter(k -> NOT_PREFIXED_FILES_REGEX.matcher(k).find())
                .toArray(String[]::new);

        System.out.println(Arrays.toString(keys));
    }
}

And the code works fine except the last item in list, i need the following condition to be satisfied,

256a5037-9fc1-4e60-95c3-523d5ae1c935/foo/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a   -- Pass
256a5037-9fc1-4e60-95c3-523d5ae1c935/bar/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a   -- Pass
govcorp/123a5037-9fc1-4e60-95c3-523d5ae1c935/foo/text.doc -- Fail
156a5037-9fc1-4e60-95c3-523d5ae1c935/123a5037-9fc1-4e60-95c3-523d5ae1c935/foo/text.doc -- Fail
123a5037-9fc1-4e60-95c3-523d5ae1c935/ - Pass

Let's consider first line,

256a5037-9fc1-4e60-95c3-523d5ae1c935/bar/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a 

If i give my input "256a5037-9fc1-4e60-95c3-523d5ae1c935/" - Pass and "256a5037-9fc1-4e60-95c3-523d5ae1c935/bar/" - Pass, am getting file path from server.

Let's consider fail case, "govcorp/" - Fail and "govcorp/123a5037-9fc1-4e60-95c3-523d5ae1c935/" - Fail

If two GUID sequence case should FAIL, such as

156a5037-9fc1-4e60-95c3-523d5ae1c935/123a5037-9fc1-4e60-95c3-523d5ae1c935/ - FAIL

If only one GUID case such as "123e4567-e89b-12d3-a456-426655440001/" - Pass

prostý člověk
  • 909
  • 11
  • 29
  • You should escape `/`. Fixing this issue both times and it works fine for me – XtremeBaumer Jun 07 '19 at 14:34
  • I'm not sure I understand your question correctly but I guess that `123a5037-9fc1-4e60-95c3-523d5ae1c935/` should match but it doesn't. Is that the problem you're facing? If so, have a look at your regex: it either requires a guid followed by "foo" or "bar" or a string without any slash (`^[^/]+$`). Your last item starts with a guid but doesn't contain "foo" or "bar" but it contains a slash so the second alternative doesn't match either. – Thomas Jun 07 '19 at 14:35
  • Could you elaborate on the possible input and what exactly you need to match? Currently your regex seems to indicate that any guid must not be preceeded by anything and must be followed by foo or bar. Any string that doesn't start with a guid must not contain a slash - so how would your last example fit into that? – Thomas Jun 07 '19 at 14:41
  • For instance, Let's consider first line 256a5037-9fc1-4e60-95c3-523d5ae1c935/foo/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a In this line if i my input is 256a5037-9fc1-4e60-95c3-523d5ae1c935/ - Pass and 256a5037-9fc1-4e60-95c3-523d5ae1c935/foo/ -- Pass too, and goes on. My regex should matches all the combination as mention pass/fail cases. If example if my input starts with govcorp/ -- Fail. If 156a5037-9fc1-4e60-95c3-523d5ae1c935/123a5037-9fc1-4e60-95c3-523d5ae1c935/ -- Pass. – prostý člověk Jun 07 '19 at 14:53
  • `Let's consider fail case, "govcorp/" - Fail and "govcorp/123a5037-9fc1-4e60-95c3-523d5ae1c935/" - Fail` Your sample only has 1 instance of `govcorp` that fails. The other sample that fails doesn't have this. So what's up, your not clear. I'm voting to close ! –  Jun 07 '19 at 16:26
  • @sln the last item is not satisfies, that what my concern here i.e 123a5037-9fc1-4e60-95c3-523d5ae1c935/ – prostý člověk Jun 07 '19 at 16:45

2 Answers2

3

Here, we would first fail our undesired strings with a simple expression:

^((?!\.doc).)*$

Demo 1

then for the remaining strings, we would be designing a second expression, which in this case, your original expression works just fine, and we might just want to wrap that with a capturing group:

([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12})

Demo 2

Test

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String regex = "([0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12})";
final String string = "256a5037-9fc1-4e60-95c3-523d5ae1c935/foo/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a\n"
     + "256a5037-9fc1-4e60-95c3-523d5ae1c935/bar/44434038019,2019-05-24T09:02:18.695Z,b4786bf4-157a-4f1b-a030-4c5416e1884a\n"
     + "123a5037-9fc1-4e60-95c3-523d5ae1c935/";

final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);

while (matcher.find()) {
    System.out.println("Full match: " + matcher.group(0));
    for (int i = 1; i <= matcher.groupCount(); i++) {
        System.out.println("Group " + i + ": " + matcher.group(i));
    }
}

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Reference
Emma
  • 27,428
  • 11
  • 44
  • 69
1

Do you want to match all .doc with regex, or just match the line which has a substring that matches your existing regex including the .doc?

In case of the latter, surround your regex with .*\b {regex} \b.*

This way, the whole line is matched, and the match is still captured.

^(.*\b[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12})\b.* 
pastaleg
  • 1,782
  • 2
  • 17
  • 23