Although it might be possible to use regular expressions to get the values, you'll still have to make some decisions/validations on those values:
- ambiguous cases (as pointed by @BasilBourque's answer) like
11.04.1983
, when you don't know if it's April 11th or November 4th: in this case, you'll have to choose one (or maybe try to guess)
- validate other values: if you get things like day 32, or Feb 29th in a non-leap year, or April 31th - you'll need to check the values before using them
- and there's also the problem of calculating the age
For case 1, well, there's not much to do than guess. If you can receive dates in any format, there's no way to really solve this ambiguity (unless you assume some format is preferred).
For cases 2 and 3, though, Java's API can help you, as it already does all the validations you need.
If you're using Java 8, consider using the new java.time API. It's easier, less bugged and less error-prone than the old APIs.
If you're using Java <= 7, you can use the ThreeTen Backport, a great backport for Java 8's new date/time classes. And for Android, there's the ThreeTenABP (more on how to use it here).
The code below works for both.
The only difference is the package names (in Java 8 is java.time
and in ThreeTen Backport (or Android's ThreeTenABP) is org.threeten.bp
), but the classes and methods names are the same.
First, I create a list of DateTimeFormatter
objects, each one capable of parsing one (or more) of the formats.
For some cases I use optional sections (delimited by []
), because some patterns differ only by a space or a comma, so keeping them optional allows me to use the same formatter for both cases.
Other cases are trickier and require a more complex approach, using a DateTimeFormatterBuilder
(see comments in the code).
After that, I remove some unnecessary stuff (like Place of birth
, spaces in the beginning and end), then I try to parse the date with all the formatters, until it works (or get a null
if none works).
Then I use the date to calculate the age in years, using the ChronoUnit
class.
// list of different formatters
List<DateTimeFormatter> list = new ArrayList<>();
// 3-3-1986 (assuming it's day-month-year)
list.add(DateTimeFormatter.ofPattern("d-M-yyyy"));
// 11.04.1983 (assuming it's day.month.year)
list.add(DateTimeFormatter.ofPattern("dd.MM.yyyy"));
// 07/24/1969 (month/day/year)
list.add(DateTimeFormatter.ofPattern("MM/dd/yyyy"));
// "December, 05, 1986", "NOVEMBER 03, 1981", "July 3,1989" and "June,11,1979"
// for " OCTOBER 06,1973", I'll remove the spaces before parsing
list.add(new DateTimeFormatterBuilder()
// case insensitive for month name
.parseCaseInsensitive()
// optional "," after month and optional spaces (after month and before year)
.appendPattern("MMMM[ ][','][ ]d','[ ]yyyy")
// use English locale for month name
.toFormatter(Locale.ENGLISH));
// "May 18th, 1984", "Jan. 27th, 1967" and "Nov. 18, 1976"
// append suffix for days (st, nd, rd and th)
// add suffix to days
Map<Long, String> days = new HashMap<>();
for (int i = 1; i <= 31; i++) {
String s;
switch (i) {
case 1:
case 21:
case 31:
s = "st";
break;
case 2:
case 22:
s = "nd";
break;
case 3:
case 23:
s = "rd";
break;
default:
s = "th";
}
days.put((long) i, i + s);
}
list.add(new DateTimeFormatterBuilder()
// month name with optional "."
.appendPattern("MMM[.] ")
// optional day with suffix
.optionalStart().appendText(ChronoField.DAY_OF_MONTH, days).optionalEnd()
// optional day without suffix
.optionalStart().appendValue(ChronoField.DAY_OF_MONTH, 1, 2, SignStyle.NEVER).optionalEnd()
// year
.appendPattern(", yyyy")
// use English locale for month name
.toFormatter(Locale.ENGLISH));
// 27/02/1978 Place of birthLisbon, Portugal ("Place of birth etc" will be removed manually)
list.add(DateTimeFormatter.ofPattern("dd/MM/yyyy"));
String[] inputs = new String[] { "3-3-1986", "11.04.1983", "07/24/1969", "December, 05, 1986", "NOVEMBER 03, 1981", " OCTOBER 06,1973",
"May 18th, 1984", "Jan. 27th, 1967", "Nov. 18, 1976", "July 3,1989", "27/02/1978 Place of birthLisbon, Portugal", "June,11,1979" };
LocalDate now = LocalDate.now(); // current date
for (String s : inputs) {
LocalDate d = parse(list, s);
if (d != null) {
// get age in years
long years = ChronoUnit.YEARS.between(d, now);
}
}
// auxiliary method
public LocalDate parse(List<DateTimeFormatter> list, String s) {
// remove the unnecessary stuff
// you can customize it to remove whatever unnecessary stuff you have in the inputs
String input = s.replaceAll("Place of birth.*", "").trim();
for (DateTimeFormatter fmt : list) {
try {
return LocalDate.parse(input, fmt);
} catch (Exception e) {
// can't parse: do nothing and try the next DateTimeFormatter
}
}
// can't parse, return null
return null;
}
Of course this code doesn't accept any possible pattern, because it's impossible. You must map all the possible patterns you can receive, and add new ones to the list once they arise. (even if you use regex, you'll probably have to change it to handle new cases, but using formatters guarantees that it will parse and validate the dates - and also calculate the ages correctly).
You can also check the javadoc for more information about all the existent patterns, in case you need more different ones.
If you're really willing to make it using regex and found a way to extract the values, you can also do:
// assuming you've already got year, month, day from the regex
LocalDate d = LocalDate.of(year, month, day);
This will throw a DateTimeException
if the values produce an invalid date. If the date is valid, you can use it to calculate the age, as already shown above.
Anyway, I don't think it's possible with one single regex (even if it is, I think it'll be so complicated that will become a nightmare to maintain) - you'll probably have to create lots of different ones and loop trough them as well.
Don't get me wrong, regex are cool, but they're not the best solution to everything.