1

I've a scenario where I'm getting date strings in various different patterns (from a third-party email server) (for example):

  • Mon, 13 Mar 2017 19:00:10 +0530 (IST)
  • Tue, 21 Mar 2017 09:23:00 -0700 (PDT)
  • Sun, 12 Mar 2017 14:31:13 +0000 (UTC)

That means, only the time-zones are being changed. I can easily parse this using Java's SimpleDateFormat, for example:

String pattern = "EEE, dd MMM yyyy HH:mm:ss Z '('z')'"
SimpleDateFormat df = new SimpleDateFormat(pattern);
df.parse("Fri, 31 Mar 2017 13:31:14 +0530 (IST)");

But when using DateTimeFormat from Joda-Time library, I'm not able to use the same pattern.

String pattern = "EEE, dd MMM yyyy HH:mm:ss Z '('z')'"
DateTimeFormat parser = DateTimeFormat.forPattern(pattern)
parser.parseDateTime("Fri, 31 Mar 2017 13:31:14 +0530 (IST)")

What I'm missing here?

Shashank Agrawal
  • 25,161
  • 11
  • 89
  • 121
  • 3
    According to javadoc (http://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html) - **Zone names:** Time zone names ('z') cannot be parsed. So this field is used only for format/toString() –  Mar 30 '17 at 16:06
  • Take a look at http://stackoverflow.com/a/4498499/7605325 –  Mar 30 '17 at 16:59
  • How is an offset of `-0700` described as UTC? The only five and a half hour offset I know of is India, but would be a negative offset while yours is a positive. Are these actual values or did assemble these examples incorrectly? – Basil Bourque Mar 30 '17 at 17:28
  • @BasilBourque sorry, the example was wrong. I fixed it. – Shashank Agrawal Mar 30 '17 at 17:29
  • Is this first one true with a positive offset for IST? – Basil Bourque Mar 30 '17 at 17:30
  • Do you understand those three are *not* the same simultaneous moment? 1:30 PM happens much earlier in India than in UTC, and 1:30 PM on the west coast of North America happens much later. – Basil Bourque Mar 30 '17 at 17:33
  • @BasilBourque those 3 examples are not really connected with each other. The examples must be confusing because of the same timestamp. All are just the random time I've picked from my production server logs. I'll update it to be less confusing. I basically have an endpoint where a thirdparty SMTP server posts incoming email data which have different timezones. :) – Shashank Agrawal Mar 30 '17 at 17:38
  • @BasilBourque updated the question. Please check. Sorry for the noise :p – Shashank Agrawal Mar 30 '17 at 17:41
  • @Hugo I really missed that line even I read the associated paragraph. I'll check it out. – Shashank Agrawal Mar 30 '17 at 17:42
  • Whoops I was wrong about the IST for India where the offset is indeed positive as you show in your example. – Basil Bourque Mar 30 '17 at 17:54
  • Yeah, but anyways, thanks for pointing the wrong examples I posted :) – Shashank Agrawal Mar 30 '17 at 17:55

1 Answers1

2

tl;dr

String input = "Mon, 13 Mar 2017 19:00:10 +0530 (IST)";
int index = input.indexOf ( " (" ); // Searching for SPACE + LEFT PARENTHESIS.
String inputModified = input.substring ( 0 , index ); // "Mon, 13 Mar 2017 19:00:10 +0530"

Instant instant = 
    OffsetDateTime.parse ( 
        inputModified , 
        DateTimeFormatter.ofPattern( "EEE, d MMM uuuu HH:mm:ss Z" ) 
    ).toInstant() 
;

See similar code run live at IdeOne.com.

Using java.time

FYI: The Joda-Time project, now in maintenance mode, advises migration to the java.time classes.

two timezone format in Joda-Time

Mon, 13 Mar 2017 19:00:10 +0530 (IST)

No, that is a zero time zone format.

The +0530 is an offset-from-UTC, a number of hours and minutes away from UTC.

Specify a proper time zone name in the format of continent/region, such as America/Montreal, Africa/Casablanca, or Pacific/Auckland. Never use the 3-4 letter abbreviation such as EST or IST as they are not true time zones, not standardized, and not even unique(!).

Since the 3-4 letter abbreviations cannot be reliably parsed, Joda-Time has a policy of refusing to try (as noted in comment by Hugo above). I suspect this is a wise policy, given what we see next.

The java.time classes will make an attempt to guess at parsing such pseudo-time-zone names but may not be your intended value. Indeed, it interprets inappropriately your first example, interpreting IST apparently as Israel Standard Time out of the choices that include India Standard Time, Ireland Standard Time, and possibly more.

String input = "Mon, 13 Mar 2017 19:00:10 +0530 (IST)";
DateTimeFormatter f = DateTimeFormatter.ofPattern( "EEE, d MMM uuuu HH:mm:ss Z '('z')'") ;
ZonedDateTime zdt = ZonedDateTime.parse ( input , f );

zdt.toString(): 2017-03-13T19:00:10+02:00[Asia/Jerusalem]

So I suggest you lop off the bogus abbreviation chunk at the end. Parse the remaining text as an OffsetDateTime which at least gives you an exact moment on the timeline. Adjust into UTC as an Instant, as most of your work should generally be done in UTC including your logging.

Lop off the abbreviation using String::substring. Note we are including the SPACE before the LEFT PARENTHESIS in our substring search as we want to delete both characters and everything after that.

String input = "Mon, 13 Mar 2017 19:00:10 +0530 (IST)";
int index = input.indexOf ( " (" ); // Searching for SPACE + LEFT PARENTHESIS.
String inputModified = input.substring ( 0 , index );

inputModified: Mon, 13 Mar 2017 19:00:10 +0530

Parse as an OffsetDateTime object using the numerical offset at the end to guide us as to the exact moment of this value.

DateTimeFormatter f = DateTimeFormatter.ofPattern( "EEE, d MMM uuuu HH:mm:ss Z" );
OffsetDateTime odt = OffsetDateTime.parse ( inputModified , f );

odt.toString(): 2017-03-13T19:00:10+05:30

Extract an Instant object to give us the same moment in UTC.

Instant instant = odt.toInstant ();

instant.toString(): 2017-03-13T13:30:10Z

You can adjust into your own particular time zone if you insist. But I advise learning to think in UTC when wearing your Programmer hat. Think of UTC as “The One True Time” and all other zones are mere variations on that theme.

ZoneId z = ZoneId.of( "America/Montreal" );
ZonedDateTime zdt = instant.atZone( z );

ISO 8601

The kind of pattern shown in your examples was common in protocols of yesteryear such as RFC 1123 / RFC 822.

Nowadays, the approach is to always use ISO 8601. In this modern standard, the formats are easy to read across various human cultures, have less reliance on the English language, are easy for machines to parse, and are designed to be unambiguous.

The java.time classes use ISO 8601 by default when generating/parsing strings. You can see their generated output in my examples above. Note that ZonedDateTime extends the standard by appending the name of the time zone in square brackets.

By the way, if you have similar inputs that comply exactly with RFC 1123, know that java.time provides a predefined formatter object, DateTimeFormatter.RFC_1123_DATE_TIME.

Community
  • 1
  • 1
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • Great answer! Very helpful. Although, I'm saving all the date/time in database & logs in UTC because as you said it is *The One True Time* :) I thought of chopping off the last part at the first hand but was curious if I'm missing something so raised a question. Since I'm getting this date string in various other patterns as well (from Mailgun so I'm stuck with multiple patterns), I will adjust the chopping code to only remove last part in this particular pattern. Thanks again! – Shashank Agrawal Mar 31 '17 at 03:28
  • 1
    @ShashankAgrawal You *could* use the offset along with the abbreviation to deduce that `+0530 (IST)` means India time (`Asia/Kolkata`) rather than Ireland etc. time. But time zones change their offsets, and they change surprisingly often. So ultimately you are just guessing, and such guessing code could break in the future. No real need to guess the intended time zone as the offset alone gets you to UTC which is all you need for logging purposes and comparison purposes. Also, note last sentence I added at the end – You say you have similar inputs, if exactly RFC 1123 use predefined formatter. – Basil Bourque Mar 31 '17 at 03:52
  • I didn't get your point about **ultimately you are just guessing**. Can you please elaborate more so that I can fix my code accordingly where I'm wrong. – Shashank Agrawal Mar 31 '17 at 05:42
  • @ShashankAgrawal As for "guessing" I am not referring to your code at all. I am referring to my comment, about mapping `+0530 (IST)` to India time. While possible with what we know now today, in the future zone definitions and offsets can and will change. When those changes occur in the future, any hard-coded mapping we make today will break. And so we would just be guessing as to intended time zone. Which is why when someone wants to communicate a time zone, they should specify a [true time zone in `continent/region` format](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). – Basil Bourque Mar 31 '17 at 07:11
  • Oh, Great! Thanks for the clarification. – Shashank Agrawal Mar 31 '17 at 07:22