1

I'd like to parse a string to an object representing a timestamp, but I'm having some trouble adding the right timezone to it.

Specifically, when parsing, I can't find a way to make a distinction between strings that have a timezone-offset added, and those that don't.

Use case

I'm reading in several xml-files. In those files, several timestamps are present, which can be either UTC, in my local timezone, or some third timezone.
Or they may not have any timezone-information as part of the string, in which case we should fall back to a default-timezone, specified elsewhere in the xml. That default can, again, be either UTC, my local timezone, or a third zone.
Ultimately, I want to translate all of these timestamps to UTC.

So, I may have the following data (on a PC with system-timezone Europe/Amsterdam, currently UTC+2):

File_one.xml

<data>
  <timestamp eventName="First">2023-5-1T12:01:00Z</timestamp>
  <timestamp eventName="Second">2023-5-1T12:02:00+02:00</timestamp>
  <timestamp eventName="Third">2023-5-1T12:03:00+04:00</timestamp>
  <timestamp eventName="Fourth">2023-5-1T12:04:00</timestamp>
</data>
<configuration>
  <timezone>Europe/Amsterdam</timezone>
</configuration>

And file_two.xml

<data>
  <timestamp eventName="Fifth">2023-5-1T12:05:00Z</timestamp>
  <timestamp eventName="Sixth">2023-5-1T12:06:00+02:00</timestamp>
  <timestamp eventName="Seventh">2023-5-1T12:07:00+04:00</timestamp>
  <timestamp eventName="Eighth">2023-5-1T12:08:00</timestamp>
</data>
<configuration>
  <timezone>America/New_York</timezone>
</configuration>

Which, after all parsing, should result in the following timestamps in UTC:

First   2023-05-01T12:01:00
Second  2023-05-01T10:02:00
Third   2023-05-01T08:03:00
Fourth  2023-05-01T10:04:00
Fifth   2023-05-01T12:05:00
Sixth   2023-05-01T10:06:00
Seventh 2023-05-01T08:07:00
Eighth  2023-05-01T16:08:00

My main problem, what this question is about, is distinguishing Eighth from Sixth.

Approaches that don't work.

I've tried using both DateTime and DateTimeOffset, with their Parse/TryParse-methods. And both seem to assume my local timezone when parsing the values without a timezone. Adding a timezone later on for values with a local timezone isn't going to work either, because that would mean messing up those values that actually are in my local timezone, i.e. Second and Sixth.

Another approach I tried was using TryParseExact to first parse those values with or without timezone-information, but unfortunately my actual timestamps aren't as tidy as the example here, and I'm not sure about all the exact formats that I can expect. What I would like is just distinguishing between an offset and no offset at all.

(I may have mixed up the words timezone and time-offset. In this context, the difference doesn't really matter)

Emil Bode
  • 1,784
  • 8
  • 16
  • 1
    I love that you add a 'don't work' section, but maybe next time put in actual code samples of what you tried. – sommmen May 01 '23 at 14:00
  • I would think that you could detect the absence of a time zone with a regular expression. Then append your default time zone to the string before parsing with DateTimeOffset. – Jack A. May 01 '23 at 14:32

2 Answers2

1

I ran into a similar situation so i'll talk about what i did in that case in the hopes that it will send you on the right way. If its completely off, let me know in a comment and i'll remove this.

I'm assuming you have no control over how the files are generated, and you can't just fix the documents to use a single format including timezone.

I have a couple tabular data files in various formats (excel/csv/tsv/json and more) which all use various formats for date time. I wrote the the following helper method which checks against a bunch of formats for various cases from most to least-specific.

public static bool TryParseExactDayMonthYear(ReadOnlySpan<char> s, out DateTime result) => TryParseExactDayMonthYear(s, CultureInfo.InvariantCulture, DateTimeStyles.None, out result);

/// <summary>
/// CultureInfo.InvariantCulture uses en-US notation, meaning day/month are twisted compared to the EU.
/// This tries various formats to get at least a D-M-Y datetime.
/// The datetime is set to UTC by default
/// </summary>
/// <param name="s"></param>
/// <param name="provider"></param>
/// <param name="style"></param>
/// <param name="result"></param>
/// <returns></returns>
public static bool TryParseExactDayMonthYear(ReadOnlySpan<char> s, IFormatProvider? provider, DateTimeStyles style, out DateTime result)
{
    if (DateTime.TryParseExact(s, "dd-MM-yyyy HH:mm", provider, style, out result)
        || DateTime.TryParseExact(s, "d-M-yyyy HH:mm", provider, style, out result)
        || DateTime.TryParseExact(s, "dd-MM-yyyy HH:mm:ss", provider, style, out result)
        || DateTime.TryParseExact(s, "d-M-yyyy HH:mm:ss", provider, style, out result)
        || DateTime.TryParseExact(s, "d-M-yyyy", provider, style, out result)
        || DateTime.TryParseExact(s, "dd-MM-yyyy", provider, style, out result)
        || DateTime.TryParseExact(s, "d-M-yy", provider, style, out result)
        || DateTime.TryParseExact(s, "dd-MM-yy", provider, style, out result)

        // Germany (DerKurrier)
        || DateTime.TryParseExact(s, "dd.MM.yyyy HH:mm", provider, style, out result)
        || DateTime.TryParseExact(s, "d.M.yyyy HH:mm", provider, style, out result)
        || DateTime.TryParseExact(s, "dd.MM.yyyy HH:mm:ss", provider, style, out result)
        || DateTime.TryParseExact(s, "d.M.yyyy HH:mm:ss", provider, style, out result)
        || DateTime.TryParseExact(s, "d.M.yyyy", provider, style, out result)
        || DateTime.TryParseExact(s, "dd.MM.yyyy", provider, style, out result)
        || DateTime.TryParseExact(s, "d.M.yy", provider, style, out result)
        || DateTime.TryParseExact(s, "dd.MM.yy", provider, style, out result)
        
        // Last resort
        || DateTime.TryParse(s, provider, style, out result)

        // For excel files created on a dutch machine...
        || DateTime.TryParse(s, CultureInfo.GetCultureInfo("nl-NL"), style, out result)
       )
    {
        result = DateTime.SpecifyKind(result, DateTimeKind.Utc);
        return true;
    }

    return false;
}

Speed here was not important for me. I can imagine you can adopt the same approach, passing different formats, timezones and cultures so that your cases are covered. (it's not pretty i know).

Another option i'd consider exploring is using the library NodaTime, which was created for dealing with these types of situations.

See for ex. this doc page:

https://nodatime.org/2.2.x/userguide/type-choices

Nodatime also inclused a timezone db which can work with values like 'Europe/Amsterdam' (which .nets TimeZoneInfo does not use. See for example this snippet that converts a php datetime type to a .net DateTime;

private static DateTime ParsePhpDateTimeString(string dateString, string timezone, long timezoneType)
{
    var localDateTimePattern = LocalDateTimePattern.CreateWithInvariantCulture("yyyy-MM-dd HH:mm:ss.ffffff");
    var localDateTime = localDateTimePattern.Parse(dateString).GetValueOrThrow();

    // See: https://stackoverflow.com/a/17711005/4122889
    //  Type 1; A UTC offset, such as in new DateTime("17 July 2013 -0300");
    //  Type 2; A timezone abbreviation, such as in new DateTime("17 July 2013 GMT");
    //  Type 3: A timezone identifier, such as in new DateTime("17 July 2013", new DateTimeZone("Europe/London"));

    switch (timezoneType)
    {
        case 1:
            var offSetPattern = OffsetPattern.CreateWithInvariantCulture("+HH:mm");
            var offset = offSetPattern.Parse(timezone).Value;
            var zonedDateTimeFromOffset = localDateTime.InZoneStrictly(DateTimeZone.ForOffset(offset));
            return zonedDateTimeFromOffset.ToDateTimeUtc();
        case 2:
            throw new NotSupportedException("Not (Yet) support converting from timeZonetype 2 - but doable to add in!");
        case 3:
            var dateTimeZone = DateTimeZoneProviders.Tzdb[timezone];
            var zonedDateTime = dateTimeZone.AtStrictly(localDateTime);
            var dateTimeUtc = zonedDateTime.ToDateTimeUtc();
            return dateTimeUtc;
        default:
            throw new ArgumentOutOfRangeException(nameof(timezoneType));
    }
}
sommmen
  • 6,570
  • 2
  • 30
  • 51
  • Thanks, looks like an approach I can use as a starting point. I'm going to look into the Nodatime-library, it looks promising. – Emil Bode May 01 '23 at 14:52
0

The input dates consists of just two type:

  • unambiguous (2023-5-1T12:01:00Z, 2023-5-1T12:02:00+02:00 etc)
  • ambiguous (2023-5-1T12:04:00, 2023-5-1T12:08:00)
    • fortunately you know the IANA timezone name

For unambiguous dates you can use DateTimeOffset.ParseExact. The result contains necessary information to convert the datetime to UTC.

For ambiguous dates you need to convert the IANA timezone name (Europe/Amsterdam and America/New_York) to something that Windows understand (W. Europe Standard Time and Eastern Standard Time). This could be done with TimeZoneInfo.TryConvertIanaIdToWindowsId function but it does not work out of the box.

If you get it to work, you can use TimeZoneInfo.GetUtcOffset to get the UTC offset for that timezone on that particular date, then build a DateTimeOffset object:

using System.Text.RegularExpressions;

List<(string dateString, string timezoneName)> tests = new()
{
    ("2023-5-1T12:01:00Z", "Europe/Amsterdam"),
    ("2023-5-1T12:02:00+02:00", "Europe/Amsterdam"),
    ("2023-5-1T12:03:00+04:00", "Europe/Amsterdam"),
    ("2023-5-1T12:04:00", "Europe/Amsterdam"),
    ("2023-5-1T12:05:00Z", "America/New_York"),
    ("2023-5-1T12:06:00+02:00", "America/New_York"),
    ("2023-5-1T12:07:00+04:00", "America/New_York"),
    ("2023-5-1T12:08:00", "America/New_York")
};
foreach (var test in tests)
{
    string dateString = test.dateString;
    string timezoneName = test.timezoneName;
    Regex tzTest = new Regex("(?:Z|[+-]\\d\\d:\\d\\d)$", RegexOptions.Compiled);
    DateTimeOffset result;
    if (tzTest.IsMatch(dateString))
    {
        result = DateTimeOffset.ParseExact(dateString, "yyyy-M-d\\THH:mm:ssK", null);
    }
    else
    {
        TimeZoneInfo.TryConvertIanaIdToWindowsId(timezoneName, out string? timezoneName_w);
        if (timezoneName_w == null)
        {
            throw new ArgumentException($"Timezone name conversion failed for {timezoneName}");
        }
        DateTime datetime = DateTime.ParseExact(dateString, "yyyy-M-d\\THH:mm:ss", null);
        TimeSpan offset = TimeZoneInfo.FindSystemTimeZoneById(timezoneName_w).GetUtcOffset(datetime);
        result = new DateTimeOffset(datetime, offset);
    }
    Console.WriteLine("{0} {1} => {2}", dateString.PadRight(24), timezoneName.PadRight(20), result.ToUniversalTime().ToString("o"));
}

Result:

2023-5-1T12:01:00Z       Europe/Amsterdam     => 2023-05-01T12:01:00.0000000+00:00
2023-5-1T12:02:00+02:00  Europe/Amsterdam     => 2023-05-01T10:02:00.0000000+00:00
2023-5-1T12:03:00+04:00  Europe/Amsterdam     => 2023-05-01T08:03:00.0000000+00:00
2023-5-1T12:04:00        Europe/Amsterdam     => 2023-05-01T10:04:00.0000000+00:00
2023-5-1T12:05:00Z       America/New_York     => 2023-05-01T12:05:00.0000000+00:00
2023-5-1T12:06:00+02:00  America/New_York     => 2023-05-01T10:06:00.0000000+00:00
2023-5-1T12:07:00+04:00  America/New_York     => 2023-05-01T08:07:00.0000000+00:00
2023-5-1T12:08:00        America/New_York     => 2023-05-01T16:08:00.0000000+00:00
Salman A
  • 262,204
  • 82
  • 430
  • 521