I have written a reg expression in Java that validates a given address and then create groups that will separate out the street number & name, city, state & zip code.
My code is as follows:
String address = "1600 Pennsylvania Ave NW, Washington, DC 20500";
String regex = "(\\s*\\d*\\s*,?\\s*(\\w*\\s*)+),?\\s*(\\w*\\s*)+\\s*,?\\s*(\\w{2})?\\s*,?\\s*(\\d{5})?\\s*";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(address);
if (matcher.matches()) {
int groupCount = matcher.groupCount();
System.out.println(groupCount);
for (int i=0; i<=groupCount;i++) {
String group = matcher.group(i);
System.out.println(group);
}
} else {
System.out.println("Does not matches");
}
The output of the code is as follows:
5
1600 Pennsylvania Ave NW, Washington, DC 20500
1600 Pennsylvania Ave NW
DC
20500
I understand that second line in the O/P is the first group that is the entire string itself as per the Javadocs. But what I am not able to understand is that why is "Washington" not getting printed. Instead there are 2 spaces that gets printed.
Can someone please explain to me what is wrong here?
Some more information: I am expecting that the user might put in the , (comma) in the address string or they might not. The user can put multiple spaces between two words. The state will always be a state code.
Thanks Raj