0

I've been struggling with Java Regex I want my regex to with 2 specific characters and then anything that matches the second group

String regex = "(^[a-zA-Z | _])([a-zA-Z0-9\\-_^\\s]*)";
Pattern pattern = Pattern.compile(regex);

String s1 = "hello world";
String s2 = "_Sau90-jds";
String s3 = "5_idsjd";
String s4 = "A-next";

ArrayList<String> list = new ArrayList<>();
list.add(s1);
list.add(s2);
list.add(s3);
list.add(s4);

for (String string : list) {
    Matcher matcher = pattern.matcher(string);
    if (matcher.find()) {
        System.out.println(matcher.group(0));
    }
}

The result I want :

_Sau90-jds
A-suivant

But I keep having

hello world
_Sau90-jds
A-suivant

My string has to start with a letter a-zA-Z or "_" and then it can ONLY contain letters, digits, underscores and hyphens, which means no White spaces.

I tried String regex = "(^[a-zA-Z | _])([a-zA-Z0-9\\-_\\S]*)"

And String regex = "(^[a-zA-Z | _])([a-zA-Z0-9\\-_]*)"

but both of them gave me

hello
_Sau90-jds
A-next
Oneiros
  • 4,328
  • 6
  • 40
  • 69
Geek Junior
  • 137
  • 4
  • 11

3 Answers3

3

Brief

There are a few things in your regex that cause it to not work as you expect.

  • [a-zA-Z | _] says to match any character present in the set, thus, this matches a-zA-Z |_ (literally). So you're actually including a match on the | character, as well as the space character.
  • [a-zA-Z0-9\-_^\s]* says to match any character present in the set, thus, this matches a-zA-Z0-9-_^\s (literally). So you're actually including a match on the ^ character, as well as any whitespace character.
  • Also, not adding the $ (end of line assertion), you're going to match hello, which you've seen in your results.

Code

See regex in use here

^[^\W\d][\w-]*$

It's basically the same as

^[a-zA-Z_][a-zA-Z0-9_-]*$

Results

Input

hello world
_Sau90-jds
5_idsjd
A-next

Output

_Sau90-jds
A-next

Explanation

  • ^ Assert position at the start of the line
  • [^\W\d] Match any word character except digits
  • [\w-]* Match any word character or hyphen - any number of times
  • $ Assert position at the end of the line
ctwheels
  • 21,901
  • 9
  • 42
  • 77
  • Thanks for your answer, both work fine. What did i do wrong ? the groups ? of the missing "$" – Geek Junior Nov 21 '17 at 15:32
  • @GeekJunior I've added an explanation in the **Brief** section in my answer that explains why your regex isn't working as you'd expect. – ctwheels Nov 21 '17 at 15:40
0

Let's take your requirements:

My string has to start with a letter a-zA-Z or "_" and then it can ONLY contain letters, digits, underscores and hyphens, which means no White spaces.

and construct the regex step by step.

  1. start with a letter a-zA-Z or "_"

    ^[a-zA-Z_]
    
  2. then it can ONLY contain letters, digits, underscores and hyphens

    [a-zA-Z\d_-]+
    
Maroun
  • 94,125
  • 30
  • 188
  • 241
  • @WiktorStribiżew It should be matched according to OP's requirements. Which is why I highlighted the parts in my answer. – Maroun Nov 21 '17 at 15:15
  • @MarounMaroun Thanks for your answer, i should have been more clear when i wrote my last result. Anyway, hello shouldn't be part of the result. – Geek Junior Nov 21 '17 at 15:35
0
Please try this :

public class Regex {

    public static void main(String[] args) {
        String regex = "^[a-zA-Z_][a-zA-Z0-9_-]*$";
        String regexPhase2 = "(^[a-z])([a-zA-Z0-9\\-_^\\s]*)";
        Pattern pattern = Pattern.compile(regex);

        String s1 = "hello world";
        String s2 = "_Sau90-jds";
        String s3 = "5_idsjd";
        String s4 = "A-next";

        ArrayList<String> list = new ArrayList<>();
        list.add(s1);
        list.add(s2);
        list.add(s3);
        list.add(s4);

        for (String string : list) {
            Matcher matcher = pattern.matcher(string);
            if (matcher.find()) {
               System.out.println(matcher.group(0));
            }
        }
    }
}