0

I am wondering how to format this expression to work in Java: [^#]+[#] (1 or more characters that are not a # followed by a #)

Using regexr.com (my favorite regex tool) this expression will get the following matches from this input text:

input:

aBc def AbC def dfe ABC
#
123
#

matches:

aBc def AbC def dfe ABC
#
123
#

However when using Scanner.next("[^#]+[#]") I get the InputMismatchException which I take it that it didn't find any matches? Do I need to escape characters? In C# I usually avoid this problem with the string literal @.

What am I missing about java Scanner and regex? Thanks.

  • I believe your problem may be the fact that you are trying to match across multiple lines. I would try using the single-line regex flag. You can do this by putting `(?s)` at the beginning of your regex string. – Charlie Armstrong Jul 28 '20 at 19:04
  • Thanks for the suggestion. That makes sense. I found this: https://stackoverflow.com/questions/3651725/match-multiline-text-using-regular-expression which explains more about the multi line regex. I am just going to read line by the file, instead of using regex to get multiple lines. `(?s)` didn't do the trick. I was trying to make getting input entries easy, but it looks like I need to read more on the https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html class. – TwoFingerRightClick Jul 28 '20 at 20:01
  • How do you format this regex in Java?: `[A-Za-z]+` It doesn't work (returns no matches), but for some reason `[A-z]+` does get all 6 matches in `aBc def AbC def dfe ABC` – TwoFingerRightClick Jul 28 '20 at 21:21

1 Answers1

0

My solution was to use Pattern and Matcher classes instead of the scanner. The scanner class didn't behave as expected with Stdin or strings and failed to get matches based on regex (using the hasNext(Regex) and next(Pattern) methods). If I read more and discover why I will post here.

The following successfully pulls each word (in this case a sequence consecutive alphabetical letters) from a string:

Pattern wordPattern = Pattern.compile("\\p{Alpha}+");
        Matcher wordFinder = wordPattern.matcher(lines.toString());
        while (wordFinder.find()){
            currentWord=wordFinder.group().toLowerCase();
            AddWord(currentWord);
        }

The posix "\\p{Alpha}+" could also be replaced with [a-zA-Z]+