2

Why does the following code output -0001-11-28T00:00:00Z instead of 0000-00-00T00:00:00Z?

import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.text.ParseException;
import java.util.Date;
import java.util.TimeZone;

class Main
{
    public static void main (String[] args) throws ParseException
    {
        DateFormat parser = new SimpleDateFormat("yyyy:MM:dd HH:mm:ss");
        parser.setTimeZone(TimeZone.getTimeZone("GMT"));
        Date date = parser.parse("0000:00:00 00:00:00");
        System.out.println(date.toInstant());
    }
}

My first thought was that this was a time-zone problem, but the output is a whopping 34 days earlier than the expected date.

This is a 3rd-party library so I cannot actually modify the code but if I can understand why it is returning this value then maybe I can tweak the inputs to get the desired output.

In case you're wondering, 0000:00:00 00:00:00 comes from EXIF metadata of images or videos.

Gili
  • 86,244
  • 97
  • 390
  • 689
  • I'd check the source code and maybe debug, but presumably the issue is in how month 0 and day 0 are interpreted during parsing given that they don't make sense in a formatted date (I don't think!) – Chris Jan 31 '21 at 20:56
  • 1
    Can be reproduced on ideone: https://ideone.com/QUc6P8 – xanatos Jan 31 '21 at 20:58
  • 4
    First of all, there was no year numbered zero (the concept of zero didn't exist). Dates go from 1 BC to 1 AD, which explains the year being `-0001`. As to the additional 34 days... there's probably a good explanation. At least 10 days of that shift may have to do with the Gregorian calendar adjustment. This may _also_ have to do with the fact that `java.util.Date` is deprecated. You should try this with the `java.time.*` classes. – Jim Garrison Jan 31 '21 at 21:00
  • 1
    Even month 0 and day 0 are illegal... Months are 1-12, and days are 1-31 (or less depending on the month and the year) – xanatos Jan 31 '21 at 21:04
  • trying with `2000:00:00 00:00:00` looks like it basically takes `2000-01-01` as a starting point then subtracts a month and one day from it for the reasons given by. @xanatos immediately above, ending up with `1999-11-30T00:00:00Z` – Martin Smith Jan 31 '21 at 21:19
  • 2
    0000 numbered year issue aside, there's some interesting points in history where calendars were changed/adjusted like when 10 whole days were lost when the Gregorian calendar was adopted in 1582 and Nov 18 1883 in the US when standardized timezones were adopted - if you manipulate dates before/after this point you'll find you lose 7mins 2seconds. – Kevin Hooke Jan 31 '21 at 21:35
  • 1
    Following up on @KevinHooke's comment about the Gregorian calendar: If you use `01` for the day and month, then you get the expected output using years from 1583 onwards (example: `1583-01-01T00:00:00Z`). Prior to 1582 (when there was that 10 day shift), there is a step backwards of 1 year at the turn of each century, the further back in time you go. But also, if the century was a leap-year, it looks like that canceled out the effect of this backwards step. – andrewJames Feb 02 '21 at 19:03

3 Answers3

4

Note that there is no differentiation between year-of-era and year in the legacy API. The year, 0 is actually 1 BC. The month, 0 and day, 0 are invalid values but instead of throwing an exception SimpleDateFormat parses them erroneously.

The reason for the month being converted to 11:

The SimpleDateFormat decreases the month numeral in the text by 1 because java.util.Date is 0 based. In other words, month, 1 is parsed by SimpleDateFormat as 0 which is month Jan for java.util.Date. Similarly, month, 0 is parsed by SimpleDateFormat as -1. Now, a neagtive month is treated by java.util.Date as follows:

month = CalendarUtils.mod(month, 12);

and the CalendarUtils#mod has been defined as follows:

public static final int mod(int x, int y) {
    return (x - y * floorDivide(x, y));
}
public static final int floorDivide(int n, int d) {
    return ((n >= 0) ?
            (n / d) : (((n + 1) / d) - 1));
}

Thus, CalendarUtils.mod(-1, 12) returns 11.

java.util.Date and SimpleDateFormat are full of such surprises. It is recommended to stop using them completely and switch to the modern date-time API.

The modern date-time API:

The modern date-time API differentiates between year-of-era and year using y and u respectively.

y specifies the year-of-era (era is specified as AD or BC) and is always a positive number whereas u specifies the year which is a signed (+/-) number.

Normally, we do not use + sign to write a positive number but we always specify a negative number with a - sign. The same rule applies for a year. As long as you are going to use a year of the era, AD, both, y and u will give you the same number. However, you will get different numbers when you use a year of the era, BC e.g. the year-of-era, 1 BC is specified as year, 0; the year-of-era, 2 BC is specified as year, -1 and so on.

You can understand it better with the following demo:

import java.time.LocalDate;
import java.time.format.DateTimeFormatter;

public class Testing {
    public static void main(String[] args) {
        System.out.println(LocalDate.of(-1, 1, 1).format(DateTimeFormatter.ofPattern("u M d")));
        System.out.println(LocalDate.of(-1, 1, 1).format(DateTimeFormatter.ofPattern("y M d")));
        System.out.println(LocalDate.of(-1, 1, 1).format(DateTimeFormatter.ofPattern("yG M d")));

        System.out.println();

        System.out.println(LocalDate.of(0, 1, 1).format(DateTimeFormatter.ofPattern("u M d")));
        System.out.println(LocalDate.of(0, 1, 1).format(DateTimeFormatter.ofPattern("y M d")));
        System.out.println(LocalDate.of(0, 1, 1).format(DateTimeFormatter.ofPattern("yG M d")));

        System.out.println();

        System.out.println(LocalDate.of(1, 1, 1).format(DateTimeFormatter.ofPattern("u M d")));
        System.out.println(LocalDate.of(1, 1, 1).format(DateTimeFormatter.ofPattern("y M d")));
        System.out.println(LocalDate.of(1, 1, 1).format(DateTimeFormatter.ofPattern("yG M d")));
    }
}

Output:

-1 1 1
2 1 1
2BC 1 1

0 1 1
1 1 1
1BC 1 1

1 1 1
1 1 1
1AD 1 1

How does modern date-time API treat 0000:00:00 00:00:00?

import java.time.ZoneOffset;
import java.time.ZonedDateTime;
import java.time.format.DateTimeFormatter;
import java.util.Locale;

class Main {
    public static void main(String[] args) {
        DateTimeFormatter parser = DateTimeFormatter.ofPattern("uuuu:MM:dd HH:mm:ss")
                                                    .withZone(ZoneOffset.UTC)
                                                    .withLocale(Locale.ENGLISH);
        
        ZonedDateTime zdt = ZonedDateTime.parse("0000:00:00 00:00:00", parser);
    }
}

Output:

Exception in thread "main" java.time.format.DateTimeParseException: Text '0000:00:00 00:00:00' could not be parsed: Invalid value for MonthOfYear (valid values 1 - 12): 0
....

With DateTimeFormatter#withResolverStyle(ResolverStyle.LENIENT):

import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.time.format.ResolverStyle;
import java.util.Locale;

public class Main {
    public static void main(String[] args) {
        DateTimeFormatter dtf = DateTimeFormatter.ofPattern("uuuu-MM-dd HH:mm:ss", Locale.ENGLISH)
                .withResolverStyle(ResolverStyle.LENIENT);
        String str = "0000-00-00 00:00:00";

        LocalDateTime ldt = LocalDateTime.parse(str, dtf);
        System.out.println(ldt);
    }
}

Output:

-0001-11-30T00:00
Arvind Kumar Avinash
  • 71,965
  • 6
  • 74
  • 110
  • How does `-0001-11-28T00:00:00Z` relate to the input `0000-00-00T00:00:00Z`? – Martin Smith Jan 31 '21 at 21:10
  • @MartinSmith - `SimpleDateFormat` is full of such surprises. Currently, I am writing some code to find out the reason for the difference. I'll update my answer with my findings. – Arvind Kumar Avinash Jan 31 '21 at 21:14
  • Can you please reword the section `difference occurs when you use a year of the era`? I don't understand what you are trying to say there. It sounds like a run-on sentence. – Gili Jan 31 '21 at 21:15
  • @Gili - I've reworded it. I hope it's clear now. What I've tried to say is that for `AD` the numbers will be the same while for `BC` the numbers will be different. I've given a couple of examples in the same sentence. – Arvind Kumar Avinash Jan 31 '21 at 21:19
2

As explained by other answers, this is a result of processing an invalid timestamp (invalid year, month and day values) with a legacy class (SimpleDateFormat) that doesn't do proper validation.

In short ... garbage in, garbage out1.

Solutions:

  1. Rewrite the code that uses SimpleDateFormat to use the new date / time classes introduced in Java 8. (Or use a backport if you have to use Java 7 and earlier.)

  2. Work around the problem by testing for this specific case before you attempt to process the string as a date.

    It seems from the context that "0000:00:00 00:00:00" is the EXIF way of saying "no such datetime". If that's the case, then trying to treat it as a datetime seems counter-productive. Treat it as a special case instead.

  3. If you can't rewrite the code or work around the problem, submit a bug report and/or patch against the (3rd party) library and hope for the best ...


1 - Why the discrepancy is exactly 1 year and 34 days is a bit of a mystery, but I'm sure you could figure out the explanation by diving into the source code. IMO, it is not worth the effort. However, I can't imagine why the Gregorian shift would be implicated in this ...

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
1

This is because year 0 is invalid, it doesn't exist. https://en.m.wikipedia.org/wiki/Year_zero

Month,day are also invalid by being 0.

beatrice
  • 3,684
  • 5
  • 22
  • 49
  • 1
    This doesn't explain the extra 34 days – Jim Garrison Jan 31 '21 at 21:03
  • 1
    There isn't 34 day difference, because you can't compare it to an invalid input. The 34 day difference would exist only if the input would be 0001-01-01 and the result would be still -0001-11-28. – beatrice Jan 31 '21 at 21:25