-1

I am getting date of birth data from a table where Oracle column type is varchar2 instead of date where the main reason is that data is parsed by a CV parsing company because different CVs has various style of date of birth like:

3-3-1986
11.04.1983
07/24/1969
December, 05, 1986
NOVEMBER 03, 1981
    OCTOBER 06,1973
May 18th, 1984
Jan. 27th, 1967
Nov. 18, 1976
July 3,1989
27/02/1978 Place of birthLisbon, Portugal
June,11,1979

Here is the method so far I have written:

public int getAge(String dob){
    int age = 0;
        if(dob==null || dob.equals("")){
        age = 0;
        }
        else{
            dob = dob.trim();
            String[] words = dob.split ("-|/");
            String day = words[0];
            String month = words[1];
            String year = words[2];             
            age = CalculateAge.AgeCalculator(day, month, year);
        }   

    return age;
}

But in this method I was able to only deal with slashes and dash. Please help me sort out how can I get day, month and year accurately from the aforementioned samples of dates.

Ghayel
  • 109
  • 1
  • 3
  • 14
  • Possible duplicate of [Java string to date conversion](https://stackoverflow.com/questions/4216745/java-string-to-date-conversion) – Populus Jul 25 '17 at 21:35
  • I agree, @Populus, that some inspiration can be found in the question you are linking to, but it’s hardly a strict duplicate. Possibly one can be found if we search a little longer… – Ole V.V. Jul 26 '17 at 00:15
  • [This Google search](https://www.google.dk/search?site=&source=hp&q=java+parse+date+different+formats&oq=java+parse+date+different+formats&gs_l=psy-ab.3..33i22i29i30k1l2.1380.1380.0.1898.1.1.0.0.0.0.206.206.2-1.1.0....0...1..64.psy-ab..0.1.205.eW6BwwQFK10) seems to suggest quite a number of similar questions. Beware though that most answers use the now long outdated `SimpleDateFormat` where today we should prefer `DateTimeFormatter`. The idea from those answers can be applied to the modern class, though. – Ole V.V. Jul 26 '17 at 00:22
  • Possible duplicate of [How to parse dates in multiple formats using SimpleDateFormat](https://stackoverflow.com/questions/4024544/how-to-parse-dates-in-multiple-formats-using-simpledateformat) – Ole V.V. Jul 26 '17 at 00:29
  • 2
    There's no single-all-in-one method to parse all inputs to a date. You'll have to map all the possible formats and create a `DateTimeFormatter` for each one and make a loop with these formatters, trying to parse to a `LocalDate` and going to the next if it gets a parsing error. You can read how to use a formatter in [oracle's date and time tutorial](https://docs.oracle.com/javase/tutorial/datetime/iso/format.html), and take a look at [tag:java-time] and [tag:date-parsing] for similar questions related to date parsing –  Jul 26 '17 at 00:59

2 Answers2

4

You cannot.

Parsing any string in any conceivable format is impossible.

Take one of your examples: 11.04.1983

Is that April 11th or November 4th? There is simply no way to know.

The best you can do is extract the year when you see a four-digit year, and perhaps judge day-of-month when greater than 12.


By the way, seems odd to be tracking birth dates and going to so much trouble to calculate age of job applicants. Age generally makes a poor criterion for job qualification. And doing so is illegal in many places.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • Can we find with regex which date format is parsed and then convert it to date? – Ghayel Jul 26 '17 at 06:00
  • @Ghayel (A) I do not understand your comment’s question. Regex is not magic; Regex won’t be able to decide April 11 from November 4. (B) Your topic has been addressed many times already on Stack Overflow. Always search thoroughly before posting. – Basil Bourque Jul 26 '17 at 06:40
  • Is there any regex to extract year from the above cited date styles? – Ghayel Jul 26 '17 at 07:11
  • 1
    No need for regex. Just split the string and look for number with four digits. For all the other formats you'll need various string-splitters, regex, or `DateTimeFormatter` objects for the various patterns. Again, no magic. Search Stack Overflow as the topics of splitting strings, using regex, and using java.tine classes for parsing have already been addressed many many many times. – Basil Bourque Jul 26 '17 at 07:15
  • 1
    @Ghayel Regex is not the best solution to this problem. You should use a lot of different `DateTimeFormatter` objects for each pattern, as already said. Although it might be possible with a single regex, it will be too complicated (to do and to maintain) and not worth the effort, IMO. –  Jul 26 '17 at 09:57
  • I used this `(\\d{4})$` regex and its worked fine for me to extract year. Now the month is remaining – Ghayel Jul 26 '17 at 18:53
0

Although it might be possible to use regular expressions to get the values, you'll still have to make some decisions/validations on those values:

  1. ambiguous cases (as pointed by @BasilBourque's answer) like 11.04.1983, when you don't know if it's April 11th or November 4th: in this case, you'll have to choose one (or maybe try to guess)
  2. validate other values: if you get things like day 32, or Feb 29th in a non-leap year, or April 31th - you'll need to check the values before using them
  3. and there's also the problem of calculating the age

For case 1, well, there's not much to do than guess. If you can receive dates in any format, there's no way to really solve this ambiguity (unless you assume some format is preferred).

For cases 2 and 3, though, Java's API can help you, as it already does all the validations you need.

If you're using Java 8, consider using the new java.time API. It's easier, less bugged and less error-prone than the old APIs.

If you're using Java <= 7, you can use the ThreeTen Backport, a great backport for Java 8's new date/time classes. And for Android, there's the ThreeTenABP (more on how to use it here).

The code below works for both. The only difference is the package names (in Java 8 is java.time and in ThreeTen Backport (or Android's ThreeTenABP) is org.threeten.bp), but the classes and methods names are the same.

First, I create a list of DateTimeFormatter objects, each one capable of parsing one (or more) of the formats.

For some cases I use optional sections (delimited by []), because some patterns differ only by a space or a comma, so keeping them optional allows me to use the same formatter for both cases.

Other cases are trickier and require a more complex approach, using a DateTimeFormatterBuilder (see comments in the code).

After that, I remove some unnecessary stuff (like Place of birth, spaces in the beginning and end), then I try to parse the date with all the formatters, until it works (or get a null if none works).

Then I use the date to calculate the age in years, using the ChronoUnit class.

// list of different formatters
List<DateTimeFormatter> list = new ArrayList<>();

// 3-3-1986 (assuming it's day-month-year)
list.add(DateTimeFormatter.ofPattern("d-M-yyyy"));

// 11.04.1983 (assuming it's day.month.year)
list.add(DateTimeFormatter.ofPattern("dd.MM.yyyy"));

// 07/24/1969 (month/day/year)
list.add(DateTimeFormatter.ofPattern("MM/dd/yyyy"));

// "December, 05, 1986", "NOVEMBER 03, 1981", "July 3,1989" and "June,11,1979"
// for " OCTOBER 06,1973", I'll remove the spaces before parsing
list.add(new DateTimeFormatterBuilder()
    // case insensitive for month name
    .parseCaseInsensitive()
    // optional "," after month and optional spaces (after month and before year)
    .appendPattern("MMMM[ ][','][ ]d','[ ]yyyy")
    // use English locale for month name
    .toFormatter(Locale.ENGLISH));

// "May 18th, 1984", "Jan. 27th, 1967" and "Nov. 18, 1976"
// append suffix for days (st, nd, rd and th)
// add suffix to days
Map<Long, String> days = new HashMap<>();
for (int i = 1; i <= 31; i++) {
    String s;
    switch (i) {
        case 1:
        case 21:
        case 31:
            s = "st";
            break;
        case 2:
        case 22:
            s = "nd";
            break;
        case 3:
        case 23:
            s = "rd";
            break;
        default:
            s = "th";
    }
    days.put((long) i, i + s);
}
list.add(new DateTimeFormatterBuilder()
    // month name with optional "."
    .appendPattern("MMM[.] ")
    // optional day with suffix
    .optionalStart().appendText(ChronoField.DAY_OF_MONTH, days).optionalEnd()
    // optional day without suffix
    .optionalStart().appendValue(ChronoField.DAY_OF_MONTH, 1, 2, SignStyle.NEVER).optionalEnd()
    // year
    .appendPattern(", yyyy")
    // use English locale for month name
    .toFormatter(Locale.ENGLISH));

// 27/02/1978 Place of birthLisbon, Portugal ("Place of birth etc" will be removed manually)
list.add(DateTimeFormatter.ofPattern("dd/MM/yyyy"));

String[] inputs = new String[] { "3-3-1986", "11.04.1983", "07/24/1969", "December, 05, 1986", "NOVEMBER 03, 1981", "    OCTOBER 06,1973",
                "May 18th, 1984", "Jan. 27th, 1967", "Nov. 18, 1976", "July 3,1989", "27/02/1978 Place of birthLisbon, Portugal", "June,11,1979" };
LocalDate now = LocalDate.now(); // current date
for (String s : inputs) {
    LocalDate d = parse(list, s);
    if (d != null) {
        // get age in years
        long years = ChronoUnit.YEARS.between(d, now);
    }
}

// auxiliary method
public LocalDate parse(List<DateTimeFormatter> list, String s) {
    // remove the unnecessary stuff
    // you can customize it to remove whatever unnecessary stuff you have in the inputs
    String input = s.replaceAll("Place of birth.*", "").trim();

    for (DateTimeFormatter fmt : list) {
        try {
            return LocalDate.parse(input, fmt);
        } catch (Exception e) {
            // can't parse: do nothing and try the next DateTimeFormatter
        }
    }

    // can't parse, return null
    return null;
}

Of course this code doesn't accept any possible pattern, because it's impossible. You must map all the possible patterns you can receive, and add new ones to the list once they arise. (even if you use regex, you'll probably have to change it to handle new cases, but using formatters guarantees that it will parse and validate the dates - and also calculate the ages correctly).

You can also check the javadoc for more information about all the existent patterns, in case you need more different ones.


If you're really willing to make it using regex and found a way to extract the values, you can also do:

// assuming you've already got year, month, day from the regex
LocalDate d = LocalDate.of(year, month, day);

This will throw a DateTimeException if the values produce an invalid date. If the date is valid, you can use it to calculate the age, as already shown above.

Anyway, I don't think it's possible with one single regex (even if it is, I think it'll be so complicated that will become a nightmare to maintain) - you'll probably have to create lots of different ones and loop trough them as well.

Don't get me wrong, regex are cool, but they're not the best solution to everything.