0

I have the following regular expression in java -

Pattern p = Pattern.compile("int|float|char\\s\\w");

But still this is matching "intern" too .

entire code -

package regex;

import java.io.*;
import java.util.*;
import java.util.regex.*;

public class Regex {

    public static void main(String[] args) throws IOException{
        // TODO code application logic here
        int c = 0;
        BufferedReader bf = new BufferedReader(new FileReader("new.c"));
        String line;
        Pattern p = Pattern.compile("int|float|char\\s\\w");
        Matcher m;
        while((line = bf.readLine()) != null) {
            m = p.matcher(line);
            if(m.find()) {
                c++;
            }
        }
        System.out.println(c);
    }
}
shA.t
  • 16,580
  • 5
  • 54
  • 111
  • Not a duplicate, the referenced question is about greedyness, this one is about precedence of operators. – SJuan76 Aug 26 '17 at 21:40
  • try to post a file content then you want to read, to help on answers – Abe Aug 27 '17 at 01:18
  • A dupe of [Regex match entire words only](https://stackoverflow.com/questions/1751301/regex-match-entire-words-only). All you need is `"int\\b|float|char\\s\\w"` to avoid matching `int` in `intern`. – Wiktor Stribiżew Aug 27 '17 at 08:31
  • I think you can use a regex like `"(int|float|char)\\s+\\w"` ;). – shA.t Aug 27 '17 at 11:13

2 Answers2

1

I assume you mean to find one of the alternatives, then followed by a space and a word.

But

 (?:
      int
   |                    # or,
      float
   |                    # or,
      char \s \w
 )

you can see from the list that the \s\w applies only to the char alternative.

To fix that, bring the \s\w outside of the group so it applies to all
the alternatives.

 (?:
      int
   |                    # or,
      float
   |                    # or,
      char 
 )
 \s \w

The final regex is then "(?:int|float|char)\\s\\w"

0

Surround the options with parentheses like so:

Pattern p = Pattern.compile("(int|float|char)\\s\\w");

Also if you want to cover some edge cases in order to deal with some bad formatted code you can use:

Pattern p = Pattern.compile("^(\\s|\\t)*(int|float|char)(\\s|\\t)+[a-zA-Z_][a-zA-Z0-9_]*(\\s|\\t)*");

This should cover cases where there is more then one spaces or tabs between the type and the variable name and also cover variable names starting with underscore, and cases when "int" "float" or "char" are the end of some word.

Ori Shalom
  • 470
  • 4
  • 11