1

I developed a class that should extend javafx.scene.control.TextField and should verify the input against an arbitrary regular expression. The next feature to implement is that the text-field should switch focus to the next control element if no additional input could be added to the actual input without giving an invalid input.

For me this boils down to: I have a string called A. I have a regular expression p. I should find out if there is a character c so that p(A+c) is true assuming that p(A) is true.

I found this Question leading to the requireEnd method of java.util.regex.Matcher, but this method only tells me if there is a character c so that p(A+c) is false assuming that p(A) is true.

My next approach was to simply loop over all characters, append them and test for a partial match using Matcher.hitEnd. This works but it requires 1.1 million chars to be tested each time to fit to unicode standard.

So I m looking for an efficient way to implement the following (shortened) class:

public class AdvancedMatcher{
        Pattern p; 
        Matcher m;
        String actInput; /**< The actual input string */
        static final int MAX_UNICODE=0x10FFFF;

        public AdvancedMatcher(String regexp){
            p = Pattern.compile(regexp);
            m = p.matcher("");
        }

        public void reset(String input) {
            m.reset(input);
            actInput=input;
        }

        /**
         * This method should check if the actual input is a positive match to the regular expression 
         * and no character could be appended to the actual input without turning the positive match 
         * into a negative match.
         *    
         * If the actual input is not a match to the regular expression of this instance this 
         * method should return false.
         * 
         * Otherwise false is returned if a character is found that could be appended to the regular 
         * expression without turning it into a negative match.
         * 
         * @return false if actual input is a negative match or if an additional {@link Character}
         *         exists that would not turn it into a negative match.  
         */
        public boolean requireEnd() {
            //To be clear, "return m.requireEnd();" does not solve this problem
            if(!m.matches())
                return false;
            for(int i=0;i<MAX_UNICODE;++i) {
                Matcher tmpM=p.matcher(actInput+String.valueOf(Character.toChars(i)));
                if(tmpM.matches() || tmpM.hitEnd())
                    return false;
            }

            return true;
        }
    }
msebas
  • 320
  • 1
  • 6
  • Well checking if *any* n chararcters break the match is a lot less efficient than checking if the last n chararcters break the match. Am I missing something? – Helen Jul 20 '19 at 14:14
  • Also this assumes that the input has a maximum length and that the actual input isn't important, only whether it is in a certain format. It doesn't make much sense for an application that takes input. – Helen Jul 20 '19 at 14:18
  • @Helen How could I get the subpattern to check if the last `n` characters break the subpattern that was not _consumed_ by the first `actInput.length()-n`characters of the string? If I have the pattern "[0-9]{4}" and the string "123" I need to get the subpattern "[0-9]{1}" for additional checks, but to get this, for an arbitrary regexp, seems to me far beyond being trivial. – msebas Jul 20 '19 at 14:55
  • It isn't possible because *arbitrary pattern* means that the pattern may contain any condition (a set as infinite as various), so there's no way to predict that when `p(A)` is true then `p(A+c)` is also true (except if you test it for all possible c.) – Casimir et Hippolyte Jul 20 '19 at 15:51
  • Consider for instance a simple lookahead like `(?!.*a)` – Casimir et Hippolyte Jul 20 '19 at 15:57
  • @CasimiretHippolyte I only need to know if any `c` exists so that `p(A+c)` is true. I could even check this for a small candidate set. `(?!.*a)` could only force to remove `'a'` from the set of possible candidates. If `P` is the string regular expression to `p` then it should be sufficient to test one `c` from `[^d]` and `[d]` for any character and character class in `P`, but this is quite complicated to implement (especially taking backreferences into account). – msebas Jul 20 '19 at 16:46

0 Answers0