90

I know this question is asked quite a bit, and obviously you can't parse any arbitrary date. However, I find that the python-dateutil library is able to parse every date I throw at it, all while requiring absolutely zero effort in figuring out a date format string. Joda time is always sold as being a great Java date parser, but it still requires you to decide what format your date is in before you pick a Format (or create your own). You can't just call DateFormatter.parse(mydate) and magically get a Date object back.

For example, the date "Wed Mar 04 05:09:06 GMT-06:00 2009" is properly parsed with python-dateutil:

import dateutil.parser
print dateutil.parser.parse('Wed Mar 04 05:09:06 GMT-06:00 2009')

but the following Joda time call doesn't work:

    String date = "Wed Mar 04 05:09:06 GMT-06:00 2009";
    DateTimeFormatter fmt = ISODateTimeFormat.dateTime();
    DateTime dt = fmt.parseDateTime(date);
    System.out.println(date);

And creating your own DateTimeFormatter defeats the purpose, since that seems to be the same as using SimpleDateFormatter with the correct format string.

Is there a comparable way to parse a date in Java, like python-dateutil? I don't care about errors, I just want it to mostly perfect.

Max
  • 6,901
  • 7
  • 46
  • 61

6 Answers6

121

Your best bet is really asking help to regex to match the date format pattern and/or to do brute forcing.

Several years ago I wrote a little silly DateUtil class which did the job. Here's an extract of relevance:

private static final Map<String, String> DATE_FORMAT_REGEXPS = new HashMap<String, String>() {{
    put("^\\d{8}$", "yyyyMMdd");
    put("^\\d{1,2}-\\d{1,2}-\\d{4}$", "dd-MM-yyyy");
    put("^\\d{4}-\\d{1,2}-\\d{1,2}$", "yyyy-MM-dd");
    put("^\\d{1,2}/\\d{1,2}/\\d{4}$", "MM/dd/yyyy");
    put("^\\d{4}/\\d{1,2}/\\d{1,2}$", "yyyy/MM/dd");
    put("^\\d{1,2}\\s[a-z]{3}\\s\\d{4}$", "dd MMM yyyy");
    put("^\\d{1,2}\\s[a-z]{4,}\\s\\d{4}$", "dd MMMM yyyy");
    put("^\\d{12}$", "yyyyMMddHHmm");
    put("^\\d{8}\\s\\d{4}$", "yyyyMMdd HHmm");
    put("^\\d{1,2}-\\d{1,2}-\\d{4}\\s\\d{1,2}:\\d{2}$", "dd-MM-yyyy HH:mm");
    put("^\\d{4}-\\d{1,2}-\\d{1,2}\\s\\d{1,2}:\\d{2}$", "yyyy-MM-dd HH:mm");
    put("^\\d{1,2}/\\d{1,2}/\\d{4}\\s\\d{1,2}:\\d{2}$", "MM/dd/yyyy HH:mm");
    put("^\\d{4}/\\d{1,2}/\\d{1,2}\\s\\d{1,2}:\\d{2}$", "yyyy/MM/dd HH:mm");
    put("^\\d{1,2}\\s[a-z]{3}\\s\\d{4}\\s\\d{1,2}:\\d{2}$", "dd MMM yyyy HH:mm");
    put("^\\d{1,2}\\s[a-z]{4,}\\s\\d{4}\\s\\d{1,2}:\\d{2}$", "dd MMMM yyyy HH:mm");
    put("^\\d{14}$", "yyyyMMddHHmmss");
    put("^\\d{8}\\s\\d{6}$", "yyyyMMdd HHmmss");
    put("^\\d{1,2}-\\d{1,2}-\\d{4}\\s\\d{1,2}:\\d{2}:\\d{2}$", "dd-MM-yyyy HH:mm:ss");
    put("^\\d{4}-\\d{1,2}-\\d{1,2}\\s\\d{1,2}:\\d{2}:\\d{2}$", "yyyy-MM-dd HH:mm:ss");
    put("^\\d{1,2}/\\d{1,2}/\\d{4}\\s\\d{1,2}:\\d{2}:\\d{2}$", "MM/dd/yyyy HH:mm:ss");
    put("^\\d{4}/\\d{1,2}/\\d{1,2}\\s\\d{1,2}:\\d{2}:\\d{2}$", "yyyy/MM/dd HH:mm:ss");
    put("^\\d{1,2}\\s[a-z]{3}\\s\\d{4}\\s\\d{1,2}:\\d{2}:\\d{2}$", "dd MMM yyyy HH:mm:ss");
    put("^\\d{1,2}\\s[a-z]{4,}\\s\\d{4}\\s\\d{1,2}:\\d{2}:\\d{2}$", "dd MMMM yyyy HH:mm:ss");
}};

/**
 * Determine SimpleDateFormat pattern matching with the given date string. Returns null if
 * format is unknown. You can simply extend DateUtil with more formats if needed.
 * @param dateString The date string to determine the SimpleDateFormat pattern for.
 * @return The matching SimpleDateFormat pattern, or null if format is unknown.
 * @see SimpleDateFormat
 */
public static String determineDateFormat(String dateString) {
    for (String regexp : DATE_FORMAT_REGEXPS.keySet()) {
        if (dateString.toLowerCase().matches(regexp)) {
            return DATE_FORMAT_REGEXPS.get(regexp);
        }
    }
    return null; // Unknown format.
}

(cough, double brace initialization, cough, it was just to get it all to fit in 100 char max length ;) )

You can easily expand it yourself with new regex and dateformat patterns.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • 3
    What do you do with ambiguous dates? For example, what does `03/04/2010` mean - 3 April 2010 or 4 March 2010? – Jesper Aug 02 '10 at 21:21
  • 3
    I guess assume one or the other (configurable) – Bozho Aug 02 '10 at 21:38
  • 4
    @Jesper: the `/` separator is commonly used to denote `MM/dd/yyyy` (mainly used in US/English locales). The `-` separator is commonly used to denote `dd-MM-yyyy` (mainly used in European locales). – BalusC Aug 02 '10 at 21:49
  • 3
    @Jesper yea you have to decide between a month or day with the format otherwise you'll never get anywhere. – Max Oct 31 '10 at 07:13
  • 1
    This will catch a bunch of common cases but is *nowhere* *near* general enough for bulletproof use in the general case. – kittylyst May 10 '13 at 10:47
  • 3
    @kittylyst: That's correct. Even more, there doesn't exist a bulletproof approach for this :) – BalusC May 10 '13 at 11:27
  • Hello @BalusC I am using your regex pattern but it does not account for dates like Aug 19, 1990. I came up with the following regex `put("^[a-z]{3} \\d[1,2], \\d{4}$", "MMM dd, yyyy");` but it does not seem to work either, could you please suggest a solution? – User3 Aug 01 '14 at 12:20
  • @User3: your regex is wrong: [a-z]{3} doesn't match 'Aug', you could use [a-zA-Z]{3} or \\w{3} instead – gabor.harsanyi Oct 01 '15 at 10:40
  • 1
    @harcos: note that code in answer does a `toLowerCase()`. – BalusC Oct 01 '15 at 10:51
  • This doesn't work for a date as simple as `Mar 5, 2014 12:00:00` – azurh Oct 29 '15 at 21:33
  • My date format like this: ( Mon, 14 Dec 2015 12:48:00 GMT) – Iman Marashi Dec 14 '15 at 19:31
  • Nice one BalusC, love your work! Though this one is a little verbose. There will probably be a problem with this (due to the RE) if you want `theDateFormat.setLenient(false)`. If I were to attempt this I would write a solution with a list of all possible date formats and all possible time formats and all possible time zone formats. And then join them all to form an |a.b.c| size solution. And not use an RE but push the input through parse / format everytime. A task for another day! – HankCa Mar 04 '16 at 05:14
  • Can be made a bit more readable by generating the regexes dynamicaly from date formats: String regex = String.format("^%s$", dateFormat); regex = regex.replaceAll("\\.", "\\\\."); regex = regex.replaceAll("\\s", "\\\\s"); regex = regex.replace("yyyy", "\\d{4}"); regex = regex.replace("yy", "\\d{2}"); regex = regex.replace("MMMM", "[a-z]{4,}"); regex = regex.replace("MMM", "[a-z]{3}"); regex = regex.replace("MM", "\\d{1,2}"); regex = regex.replace("dd", "\\d{1,2}"); regex = regex.replace("HH", "\\d{1,2}"); regex = regex.replace("mm", "\\d{2}"); regex = regex.replace("ss", "\\d{2}"); – jgosar Apr 07 '17 at 08:30
  • Just to add since it's the format we needed for dates coming from 3rd party system, `put("^\\d\\d-[a-z]{3}-\\d\\d$", "dd-MMM-yy");` (icky two digit years) – Samuel Neff Jul 11 '17 at 20:30
52

There is a nice library called Natty which I think fits your purposes:

Natty is a natural language date parser written in Java. Given a date expression, natty will apply standard language recognition and translation techniques to produce a list of corresponding dates with optional parse and syntax information.

You can also try it online!

Cacovsky
  • 2,536
  • 3
  • 23
  • 27
17

You could try dateparser.

It can recognize any String automatically, and parse it into Date, Calendar, LocalDateTime, OffsetDateTime correctly and quickly(1us~1.5us).

It doesn't based on any natural language analyzer or SimpleDateFormat or regex.Pattern.

With it, you don't have to prepare any appropriate patterns like yyyy-MM-dd'T'HH:mm:ss.SSSZ or yyyy-MM-dd'T'HH:mm:ss.SSSZZ:

Date date = DateParserUtils.parseDate("2015-04-29T10:15:00.500+0000");
Calendar calendar = DateParserUtils.parseCalendar("2015-04-29T10:15:00.500Z");
LocalDateTime dateTime = DateParserUtils.parseDateTime("2015-04-29 10:15:00.500 +00:00");

All works fine, please enjoy it.

sulin
  • 358
  • 5
  • 7
  • Just had a look, it seems covering wide variety of formats – Sankalp Sep 17 '19 at 17:40
  • Works for my usecase – prodigy4440 Jan 12 '22 at 08:26
  • InputDate: "04/26/2022 12:00:00.000" - ExpectedDate: "04-26-2022" I just checked the [README.md](https://github.com/sisyphsu/dateparser) & saw the format I had was mentioned & was supported. Thanks. The following code saved my day: `LocalDateTime localDateTime = DateParserUtils.parseDateTime("04/26/2022 12:00:00.000");` `System.out.println("Parsed/Converted LocalDateTime form:: "+localDateTime.toString());` `System.out.println("Target/Expected date:: "+DateTimeFormatter.ofPattern("MM-dd-yyyy").format(localDateTime));` – Aniket Apr 27 '22 at 08:07
7

What I have seen done is a Date util class that contains several typical date formats. So, when DateUtil.parse(date) is called, it tries to parse the date with each date format internally and only throws exceptions if none of the internal formats can parse it.

It is basically a brute force approach to your problem.

Robert Diana
  • 860
  • 7
  • 18
  • I think this is the most straight-forward and comprehensible approach. Since a date string of unknown format is ambigious by design, putting too much "intelligence" into the attempt to recognize the format probably results in more "surprising" results. – Erich Kitzmueller Aug 02 '10 at 17:28
  • Yes, but I think there are a few assumptions you can make given a bit of starting information (order of day/month/year in a date) to correctly parse most sane dates without a big lookup table. – Max Aug 02 '10 at 23:21
  • Max, that is true, and most likely there is a limited set of date formats that you would be looking for. You can make very few assumptions about the order of day and month without writing a full blown date parsing engine. Is there a specific use case for this, because that could help point people in the right direction. For example, most date formats from various social media services fit into about 10 popular formats. – Robert Diana Aug 02 '10 at 23:32
  • Perhaps I'm more interested in the usability aspect. "Parse most dates without ever dealing with a format string again". I think I really just want to see a library like python-dateutil in Java, which I suppose would mean I should make it if I want it so bad! – Max Aug 03 '10 at 01:53
  • I guess our definitions of usability are different too. The date class I had seen was able to parse dates from around 30 different web services. Using the date class was as simple as parse(date), so as a user of the utility I did not have to worry about date formats. The writer of the utility did the worrying for me. – Robert Diana Aug 03 '10 at 02:01
  • @RobertDiana Thanks for your suggestion. Can you please specify package for DateUtil since I can find only DateUtil.parse methods where passing patterns is mandatory. – NameNotFoundException Feb 15 '18 at 12:20
1
//download library:   org.ocpsoft.prettytime.nlp.PrettyTimeParser
String str = "2020.03.03";
Date date = new PrettyTimeParser().parseSyntax(str).get(0).getDates().get(0);
System.out.println(date)
gehbiszumeis
  • 3,525
  • 4
  • 24
  • 41
Mahdi
  • 57
  • 6
-2

I have no idea about this parsing how to do in python. In java we can do like this

SimpleDateFormat sdf1 = new SimpleDateFormat("dd-MM-yyyy");
  java.util.Date normalDate = null;
  java.sql.Date sqlDate = null;
  normalDate = sdf1.parse(date);
  sqlDate = new java.sql.Date(normalDate.getTime());
  System.out.println(sqlDate);

i think like java some predefined functions will be there in python. You can follow this method. This methods parse the String date to Sql Date (dd-MM-yyyy);

import java.text.SimpleDateFormat;
import java.text.ParseException;
public class HelloWorld{
     public static void main(String []args){
        String date ="26-12-2019";
         SimpleDateFormat sdf1 = new SimpleDateFormat("dd-MM-yyyy");
        java.util.Date normalDate = null;
        java.sql.Date sqlDate = null;
        if( !date.isEmpty()) {
            try {
                normalDate = sdf1.parse(date);
                sqlDate = new java.sql.Date(normalDate.getTime());
                System.out.println(sqlDate);
            } catch (ParseException e) {
            }
        }
     }
} 

execute this!

  • 2
    Please don’t teach the young ones to use the long outdated and notoriously troublesome `SimpleDateFormat` class. At least not as the first option. And not without any reservation. Today we have so much better in [`java.time`, the modern Java date and time API,](https://docs.oracle.com/javase/tutorial/datetime/) and its `DateTimeFormatter`. – Ole V.V. Dec 24 '19 at 15:30
  • If we know how to solve the problem, then we will look into the latest updates. Now we got a solution, we will try to get much better one. Anyways, thanks for your update! – Shashidhar Reddy Dec 25 '19 at 12:41