0

Why the string of date with specific format is successfully parsed by strptime method with an explicit different format?

need to explicitly accepted format date for API

$ ruby -v
ruby 2.5.1p57 (2018-03-29 revision 63029) [x86_64-linux]
$ irb
irb(main):001:0> require 'date'
=> true
irb(main):002:0> Date.strptime('01-01-1970', '%Y-%m-%d')
=> #<Date: 0001-01-19 ((1721442j,0s,0n),+0s,2299161j)>
irb(main):003:0> Date.strptime('01-01-1970', '%Y-%m-%d').to_s
=> "0001-01-19"
irb(main):004:0> Date.strptime('01-01-1970', '%Y-%m-%d').year
=> 1
irb(main):005:0> Date.strptime('01-01-1970', '%Y-%m-%d').day
=> 19

Expect: test passed

it 'raises an exception when wrong format' do
 expect { Date.strptime('01-01-1970', '%Y-%m-%d') }.to raise_exception(ArgumentError, 'invalid format')
end

Actual: expected ArgumentError with "invalid format" but nothing was raised

Alex Strizhak
  • 910
  • 1
  • 12
  • 22
  • Because of how Ruby parses dates it's perfectly valid, which you can show yourself by running it in the repl. – Dave Newton Jun 14 '19 at 19:20
  • 1
    Possible duplicate of [How to check if a string is a valid date](https://stackoverflow.com/questions/2955830/how-to-check-if-a-string-is-a-valid-date) – max pleaner Jun 14 '19 at 19:23
  • @DaveNewton did you see the format? `'%Y-%m-%d'` what is `perfectly valid`, the `1970` of days? – Alex Strizhak Jun 14 '19 at 19:33
  • @maxpleaner did you even read? the question is: `why it's happened?`, not: `how to parse string date?` – Alex Strizhak Jun 14 '19 at 19:35
  • 1
    @AlexeyStrizhak I did indeed. And yes, because of how it parses dates, as explained in detail by Sergio, it will parse to a valid date. You'll need to use a regex or a stricter date parsing mechanism. – Dave Newton Jun 14 '19 at 19:45
  • yup, I expect something like this: `[/\d{4}(-\d{2}){2}/]`, anyway thanks – Alex Strizhak Jun 14 '19 at 20:20
  • @AlexeyStrizhak: oh wow, it never occurred to me that one could use a quantifier here (for the `-\d{2}` part). While your version does the same thing, maybe a `/\d{4}-\d{2}-\d{2}/` will express the intent more clearly. – Sergio Tulentsev Jun 14 '19 at 20:30
  • @AlexeyStrizhak I'm just trying to help. You expected strftime to raise an error with these inputs but it doesnt. Why not? Feel free to look at the source code of it. But as that linked answer says, there are other ways to go about accomplishingg your _goal_ (which is to raise an error on parsing invalid date) – max pleaner Jun 14 '19 at 20:46

1 Answers1

7

You have two hidden questions, I think.

Why is 01 a valid match for %Y (which means "year including century")

Because why assume 4 digit years? Otherwise you wouldn't be able to specify 3-digit years (for example, year 882 was when Kiev became capital of Rus). Or maybe in this case you did mean year 1. Ruby has no idea.

Why is 1970 a match for %d?

Because that's how strptime(3) works (which it's supposed to be compatible with). Once format descriptor %d ("day, 1-31") is satisfied with 19, the string stops being processed.

The return value of the function is a pointer to the first character not processed in this function call. In case the input string contains more characters than required by the format string the return value points right after the last consumed input character.

Sergio Tulentsev
  • 226,338
  • 43
  • 373
  • 367
  • hm... why the `Ruby` is splitting the string and analyze it by parts? I mean, if I provide the explicit format, then I expect the parsing in this logic scope of format, I have not idea why it should work not in complex of format – Alex Strizhak Jun 14 '19 at 19:44
  • @AlexeyStrizhak: maybe you simply have wrong expectations about what format specifiers do or how strict they are. If you _must_ reject the date strings from your question (as not conforming to your format), check the string with a regex before parsing it as date. This is actually a problem you can see in coding interviews (I did). – Sergio Tulentsev Jun 14 '19 at 19:50
  • make sense. thank you |> wrong expectations| yeah, looks so. honestly, I'll re-read your answer with try to understand, thank you! – Alex Strizhak Jun 14 '19 at 19:56
  • I was unaware of this behaviour (re day), and am wondering why the Ruby monks chose to do it that way. After all, if we are to permit `Date.strptime('28-03-2019', '%Y-%m-%d')` why assume the day is `"20"` by the string `"19"` and not the day `"2"` followed by the string `"019"`? Moreover, if this were an inadvertent reversing of the format string, by not raising an exception Ruby is making it harder for the coder to track down the bug. Can you or anyone suggest a reason for this language design decision? – Cary Swoveland Jun 14 '19 at 21:15
  • @CarySwoveland: I'm betting on compatibility with C stdlib. I mean, if we're copying `printf` and its format string behaviour, why deviate in implementation of this one? Principle of least surprise and all. – Sergio Tulentsev Jun 15 '19 at 08:09
  • @CarySwoveland "why assume the day is "20" by the string "19" and not the day "2" followed by the string "019"" - well, day _is_ defined as having values from 1 to 31. So you _have_ to look at two digits. But a day will never have three digits. Unlike a year. – Sergio Tulentsev Jun 15 '19 at 08:14
  • @CarySwoveland: some more examples of parsing strings as days: `2o10` => 2 (2 is not followed by a digit); `3110` => 31, but `3210` => ArgumentError (day can't be 32. We also can't fallback to splitting this into 3 + 210, because we have no basis to assume that it'd be the correct recovery. Data is ambiguous. Panic and let the user deal with it). It all sounds very logical to me. – Sergio Tulentsev Jun 15 '19 at 08:47
  • It's going to be tough, but I guess I'll have to live with it. – Cary Swoveland Jun 16 '19 at 03:21