1

Datetime objects come to my program in two different formats: as unix timestamps and as yyyy-MM-dd HH:mm:ss.S. For example, 1520877600 or 2018-04-23 11:12:00.0. I need to extract a year and month from these objects, automatically recognizing a format.

This is the function that extracts year from yyyy-MM-dd HH:mm:ss.S:

  def getYear(datetimeString: Any): Int = {
    var year = 2017
    if (!datetimeString.toString.isEmpty) {
      val dateFormat = "yyyy-MM-dd HH:mm:ss.S"
      val dtf = java.time.format.DateTimeFormatter.ofPattern(dateFormat)
      val d = java.time.LocalDate.parse(datetimeString.toString, dtf)
      year = d.getYear
    }
    year
  } 

And this is the same function for the unix timestamp:

  def getYear(timestamp: Any): Int = {
    var year = 2017
    if (!timestamp.toString.isEmpty)
    {
      year = new DateTime(timestamp.toString.toLong).getYear
    }
    year
  }

How can I merge them into a single function, so that my program would be flexible and would work with both formats?

jwriteclub
  • 1,604
  • 1
  • 17
  • 33
ScalaBoy
  • 3,254
  • 13
  • 46
  • 84
  • Use one format, if that fails, use the other – MadProgrammer Apr 02 '18 at 08:45
  • Conceptually, consider creating a new wrapper function which takes in a variable `timestampOrDatetime`. Check to see if the variable is all digits, if it is, use your timestamp code. If not, use your datetime code. – jwriteclub Apr 02 '18 at 08:46
  • @jwriteclub: Good idea, thanks. I came out with this solution `def isAllDigits(x: String) = x forall Character.isDigit`. – ScalaBoy Apr 02 '18 at 08:58
  • 1
    @ScalaBoy great. You know you can go ahead and answer your own question so that the knowledge is more easily accessible for anyone who comes across it in the future. – jwriteclub Apr 02 '18 at 10:49
  • 1
    It's worth noting that you should consider isolating usage of auto-detection code to the edges of your system - internally it's best to store data in structures where the type is known, in a consistent form, or with metadata about the type. Auto-detection can be a bug breeding ground as when it goes wrong, it can be hard to notice. – Gareth Latty Apr 02 '18 at 11:24

4 Answers4

1

You can use a java.time.format.DateTimeFormatterBuilder to build a formatter with optional parts, where each optional part is a DateTimeFormatter that can parse one of those formats.

I'm posting code in Java, because I'm not a Scala dev, but it shouldn't be hard to adapt it.

First you make the formatter for the date/time pattern:

DateTimeFormatter datetimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss.S");

Then you make another formatter to parse the timestamp. The value 1520877600 seems to be in seconds since unix epoch, so you can use the ChronoField.INSTANT_SECONDS field:

DateTimeFormatter timestampFormatter = new DateTimeFormatterBuilder()
    // parse timestamp value in seconds
    .appendValue(ChronoField.INSTANT_SECONDS)
    // create formatter
    .toFormatter();

And then you join the 2 formatters above in a single one, making each formatter optional:

DateTimeFormatter fmt = new DateTimeFormatterBuilder()
    // date/time
    .appendOptional(datetimeFormatter)
    // timestamp
    .appendOptional(timestampFormatter)
    // use JVM default timezone
    .toFormatter().withZone(ZoneId.systemDefault());

Another detail is the withZone method to set a timezone to be used by the formatter. That's because an unix timestamp represents a count of elapsed time since unix epoch, and the value 1520877600 can represent a different date and time, depending on the timezone you are.

I'm using the JVM default timezone (ZoneId.systemDefault()), but you can choose it to whatever you need. Exampe: if I use ZoneId.of("America/New_York"), the timestamp will be converted to New York timezone. Using a different timezone can affect the values of year and month, specially if the value corresponds to the first or last day of the month (and if I don't set a timezone, the parsing will fail for timestamps, because it needs a timezone to "translate" the timestamp to a date/time).

Anyway, as you want the year and month values, the best choice is to parse directly to a java.time.YearMonth, which in turn can be used to get the correspondent int values for year and month:

YearMonth ym = YearMonth.parse("2018-04-23 11:12:00.0", fmt);
int year = ym.getYear();
int month = ym.getMonthValue();

ym = YearMonth.parse("1520877600", fmt);
year = ym.getYear();
month = ym.getMonthValue();
uilon
  • 135
  • 4
1
  val isoFormat = "(\\d{4})-(\\d{2})-(\\d{2}) (\\d{2})\\:(\\d{2})\\:(\\d{2})\\.(\\d+)".r 

  def getYear(timestamp: Any): Int = timestamp match {
      case isoFormat(year, month, day, hour, minute, second, millis) => year.toInt
      case l : Long => {
        val c = Calendar.getInstance()
        c.setTimeInMillis(l)
        c.get(Calendar.YEAR)
      }
      case _ => 2017
  }

  println(getYear("2018-03-31 14:12:00.231"))
  println(getYear(System.currentTimeMillis()))
  println(getYear("Foo"))

This example uses scala's pattern matching syntax. Let's start from the bottom:

  • If the given value is neither a proper string of a long, return 2017 as default (might want to make this configurable)
  • If the value is a long, parse it - I used Calendar in this case to avoid the string conversions, you may want to add your time zone
  • If the code is a iso formatted string, use a regex to extract the fields we want. This may seem like compiler magic at first, but is simply using scala's unapply method for pattern matching. You can find a proper explanation here: REGULAR EXPRESSION PATTERNS . Note: This could be written shorter, verbose version for clarity.

The main benefit I see with the above approach is that it will be very straight-forward to extends the method with additional date formats.

Community
  • 1
  • 1
sarcan
  • 3,145
  • 19
  • 22
  • I'm a big fan of regex, but in this case I wouldn't use it because, although it might seem good for parsing a date, it's bad for validating it. Inputs like `9999-99-99 99:99:99.999` will be considered valid by the regex, while a proper class such as `DateTimeFormatter` will correctly throw an exception. Not to mention the tons of corner cases (day > 31, day zero, February 29th in a non-leap year, etc). I know the question is about "getting year from a date-like string", but I think that the input must be validated as being a correct date/time value, and regex is not the best tool for it – uilon Apr 02 '18 at 17:07
1

This code takes your inputs and gets the year. It uses SimpleDateFormat to convert.

import java.text.SimpleDateFormat
import java.util.{Calendar, Date, GregorianCalendar}
import scala.util.{Failure, Success, Try}

def recognizeTimeStamp(timeStamp: String): Int = {

  val myCal = new GregorianCalendar();
  timeStamp match {
    case "unknown" => -1
    case x if x.replaceAll("\\d", "") == "" => {
      myCal.setTime(new Date(x.toLong))
      myCal.get(Calendar.YEAR)
    }

    case x =>
       val format = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.S")
       Try(format.parse(x)) match {
            case Success(t) => {
                myCal.setTime(t)
                myCal.get(Calendar.YEAR)
            }
            case Failure(_) => -1
       }
   }
}

recognizeTimeStamp("2018-04-23 11:12:00.0")
recognizeTimeStamp("1334946600000")

Answer from my Scala worksheet:

res0: Int = 2018
res1: Int = 2012
spiralarchitect
  • 880
  • 7
  • 19
  • 1
    OP is using [`java.time` classes](https://docs.oracle.com/javase/tutorial/datetime), which is a most modern and [much better](https://madhuraoakblog.wordpress.com/2017/06/08/performance-improvement-with-date-time-api-of-java-8/) API than `SimpleDateFormat`. This class (and also `Date` and `Calendar`) has [tons of problems](https://stackoverflow.com/q/1571265) and [bad-design issues](https://stackoverflow.com/q/1969442), and using `java.time` is highly recommended. – uilon Apr 02 '18 at 17:08
0

I came up with the solution suggested in comments. I created a function def isAllDigits(x: String) = x forall Character.isDigit and check if the String is all digits or not.

ScalaBoy
  • 3,254
  • 13
  • 46
  • 84