0

I am trying to loop over a date range and have copied the answer on this post.

My script 'loop_run_local.sh':

d=2019-11-12
while [ "$d" != 2019-11-14 ]; do
  echo 'Starting Data Extract...'
  echo 'run date is: ' $d
  echo 'Starting Transactions Extract...'
  python3.7 flagship_ecom/run_transactions.py $d    
  d=$(date -I -d "$d + 1 day")
done

This seems to work for the first iteration only, not subsequent dates so I guess something is not working with d=$(date -I -d "$d + 1 day")?

Terminal output before I hit ctrl+z to stop the loop:

bash-3.2$ ./flagship_ecom/loop_run_local.sh
Starting Data Extract...
run date is:  2019-11-12
Starting Transactions Extract...
pageToken is:0 : 2019-11-12
/Users/macuser/Library/Python/3.7/lib/python/site-packages/tqdm/std.py:658: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
  from pandas import Panel
Pandas Apply: 100%|██████████████████████████████████████████████████████████████████████████████| 2737/2737 [00:00<00:00, 3156.76it/s]
date: illegal option -- I
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
            [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
Starting Data Extract...
run date is:
Starting Transactions Extract...
Traceback (most recent call last):
  File "flagship_ecom/run_transactions.py", line 17, in <module>
    start_date = sys.argv[1]
IndexError: list index out of range
date: illegal option -- I
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
            [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
Starting Data Extract...
run date is:
Starting Transactions Extract...
Traceback (most recent call last):
  File "flagship_ecom/run_transactions.py", line 17, in <module>
    start_date = sys.argv[1]
IndexError: list index out of range
date: illegal option -- I
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
            [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
Starting Data Extract...
run date is:
Starting Transactions Extract...
^Z
[2]+  Stopped                 ./flagship_ecom/loop_run_local.sh

This line in the output:

run date is: 2019-11-12

Corresponds to this line within the .sh script:

echo 'run date is: ' $d

Since this is blank on dates after the starting date, I presume there's an issue with my incrementing of the date?

Here is the output when prefixed with bash -x per comments:

bash-3.2$ bash -x ./flagship_ecom/loop_run_local.sh
+ d=2019-11-12
+ '[' 2019-11-12 '!=' 2019-11-14 ']'
+ echo 'Starting Data Extract...'
Starting Data Extract...
+ echo 'run date is: ' 2019-11-12
run date is:  2019-11-12
+ echo 'Starting Transactions Extract...'
Starting Transactions Extract...
+ python3.7 flagship_ecom/run_transactions.py 2019-11-12
pageToken is:0 : 2019-11-12
/Users/macuser/Library/Python/3.7/lib/python/site-packages/tqdm/std.py:658: FutureWarning: The Panel class is removed from pandas. Accessing it from the top-level namespace will also be removed in the next version
  from pandas import Panel
Pandas Apply: 100%|██████████████████████████████████████████████████████████████████████████████| 2737/2737 [00:00<00:00, 2971.93it/s]
++ date -I -d '2019-11-12 + 1 day'
date: illegal option -- I
usage: date [-jnRu] [-d dst] [-r seconds] [-t west] [-v[+|-]val[ymwdHMS]] ...
            [-f fmt date | [[[mm]dd]HH]MM[[cc]yy][.ss]] [+format]
+ d=
+ '[' '' '!=' 2019-11-14 ']'
+ echo 'Starting Data Extract...'
Starting Data Extract...
+ echo 'run date is: '
run date is:
+ echo 'Starting Transactions Extract...'
Starting Transactions Extract...
+ python3.7 flagship_ecom/run_transactions.py
Traceback (most recent call last):
  File "flagship_ecom/run_transactions.py", line 17, in <module>
    start_date = sys.argv[1]
IndexError: list index out of range
^Z
[8]+  Stopped                 bash -x ./flagship_ecom/loop_run_local.sh
bash-3.2$

I am using Mac.

Doug Fir
  • 19,971
  • 47
  • 169
  • 299
  • Run `bash -x yourscript` to log each command it runs, so we don't need guesswork about what the values are. – Charles Duffy Dec 24 '19 at 20:29
  • Also, `date` is not part of bash -- it's provided by your OS vendor, so we need to know what OS you run. – Charles Duffy Dec 24 '19 at 20:30
  • 3
    `$d` is blank after the first iteration because the `date` command is erroring, producing blank output; the `illegal option -- I` tells me you don't have the same version of `date` being used in the answer you reference, which looks to rely on GNU `date` – landru27 Dec 24 '19 at 20:30
  • Hi, added output of calling with -x above. Also, I am on Mac – Doug Fir Dec 24 '19 at 20:34
  • 1
    If you're using Python already, you could just move the outer loop into your Python code, which is generally much better at handling dates that a shell script. – larsks Dec 24 '19 at 20:53
  • I keep seeing references to gdate but could not get it to work. The linked answers that this post were closed for are not readable, and the 3rd one does not apply to Mac – Doug Fir Dec 24 '19 at 20:57
  • `gdate` is the name given to GNU date when installed with Macports or Homebrew. And the very first linked duplicate has a section of its answer that's explicitly correct for the MacOS/BSD `date` implementation, whereas the one in the answer here is buggy (not every day has exactly 86400 seconds -- leap seconds are a thing that exist). – Charles Duffy Dec 24 '19 at 22:14
  • @CharlesDuffy : no, my script is not buggy; `date` is unaffected by leap seconds; see the notes I've added to my answer, and the associated references – landru27 Dec 27 '19 at 02:40
  • Yup -- the bug would be more serious *if* it worked the way the OP's code does, using the YYYY-mm-dd datestamp as the base for calculating the next date, but since it doesn't, that's not so bad (though I still don't see why anyone would use it in preference to tools that can add "a day" and adjust accordingly). – Charles Duffy Dec 27 '19 at 02:59

1 Answers1

1

I also happen to be using a Mac. Here is an alternative script that increments seconds instead, and emits the normal calendar representation of that:

#!/bin/bash

dsecs=`date '+%s'`
ddate=`date -r $dsecs '+%F'`
while [ "$ddate" != 2019-12-31 ]; do
  echo 'run date is: ' $ddate
  dsecs=$((dsecs + 86400))
  ddate=`date -r $dsecs '+%F'`
done

Output:

run date is:  2019-12-24
run date is:  2019-12-25
run date is:  2019-12-26
run date is:  2019-12-27
run date is:  2019-12-28
run date is:  2019-12-29
run date is:  2019-12-30

You could use $ddate as the input for the programs you want to call from this script.

The above script starts the iteration at today's date; if you need to start the iteration at some other date, just replace the first dsecs= line with the Unix Epoch value of, say, noon on the date on which you want the iteration to start


regarding leap seconds

Constructive feedback is welcome, but the concerns raised in comments on this answer and on the question about this script being "buggy" due to ignoring the matter of leap seconds are misplaced. Here's why:

  1. date does not take leap seconds into account, because Unix time does not take leap seconds into account [1]; adding 86,400 seconds to a Unix Epoch time will always cause date to yield the same time, advanced by one day -- i.e., exactly one day later
  2. POSIX time also does not concern itself with leap seconds; c.f., see the discussion on this SO question, and its references
  3. even if leap seconds were at play here, using 86,400 seconds to advance the clock by one whole day only fails when you start at midnight on a date that is going to have a leap second added; otherwise, 86,400 seconds later is "the next day"
  4. yes, because of leap seconds, adding 86,400 seconds repeatedly does produce a drift (but again, not for date) -- but there have only been 27 leap seconds so far, spaced an average of 20 months apart [2]; thus:

    4.1. it generally takes a span of years to even possibly be affected by this

    4.2. a starting time of a half-minute or more after midnight eliminates the effect of this drift entirely

    4.3. there haven't been any leap seconds since 12/31/2016, so this isn't a concern for dates between then and now

  5. future dates do not (yet) have leap seconds at all -- they are not predictable in the same way leap years are -- so calculating future dates is even less of a concern

So, the operation of date itself makes repeatedly adding 86,400 seconds valid, and even if that weren't the case, the circumstances where leap seconds matter are narrow. As I noted in one of my comments, all real-world engineering is done in the context of the application at hand. In the wide, wide swath of circumstances where leap seconds don't amount to enough of an effect -- or don't exist at all -- a solution that does not take them into account is not buggy ... and also not over-engineered.

[1] https://en.wikipedia.org/wiki/Unix_time

[2] https://en.wikipedia.org/wiki/Leap_second

landru27
  • 1,654
  • 12
  • 20
  • Note that use of backticks for command substitution is frowned on -- `$(...)` has been part of the POSIX sh standard since its initial publication in 1991, and is far better behaved (nests properly, doesn't require backslashes to have extra escaping). – Charles Duffy Dec 24 '19 at 20:48
  • (Also, see the "Answer Well-Asked Questions" section of [How to Answer](https://stackoverflow.com/help/how-to-answer), particularly including the bullet point regarding questions which "have already been asked and answered many times before"). – Charles Duffy Dec 24 '19 at 20:50
  • This ignores leap seconds, and will thus sometimes return an incorrect result. – Charles Duffy Dec 24 '19 at 22:17
  • @CharlesDuffy : thanks for your collective feedback; as for the leap seconds, all real-world engineering involves an amount of "tolerance"; leap seconds have thus far been inserted an average of once per 20 months, so using noon as the starting point, it would take 72,000 years for the scripted method to produce the wrong date; each person considering this script should decide for themselves if their situation can tolerate an accurate result for only the next 72,000 years – landru27 Dec 26 '19 at 03:56
  • Mmm? You don't need to accumulate a day's worth of leap seconds for this code to misbehave; you only need *one* day when you'd need to add `86401` seconds to get to the next date and only add `86400`, and the code never proceeds past that date. (OTOH, you could easily just change the hardcoded `86400` to `86401`, and I'd have no more room for objection). – Charles Duffy Dec 26 '19 at 05:02
  • @CharlesDuffy : I think you should re-visit your analysis of my script; I added notes to my answer on why leap seconds are a total non-issue; also, even if leap seconds were at play, my script would not get hung up on any date; finally, repeatedly adding 86,401 seconds, however, will *certainly* cause it to eventually skip a whole day – landru27 Dec 27 '19 at 02:39
  • Ahh, right -- my analysis assumed you were following the OP's original logic of converting back from UNIX time to a YYYY-mm-dd timestamp between loop cycles (thus dropping the overage), but that's not true. – Charles Duffy Dec 27 '19 at 02:56