-1

I have CVS files which contain date information in three separate colums which I would like to combine. The information I have is:

  • Two digit year (field 2)
  • Week number (field 3)
  • Day of week number (field 4)

How can I convert these 3 numbers into normal date format of the form YYYYMMDD?

My input file looks like:

740054,17,40,1,0000000000001,25,25,test1,1
740054,17,40,2,0000000000001,24,24,test2,1
740054,17,40,4,0000000000001,19,19,test3,1

And the expected output I would like to have is:

740054,20171002,0000000000001,25,25,test1,1
740054,20171003,0000000000001,24,24,test2,1
740054,20171005,0000000000001,19,19,test3,1

As an example for the first line: October 2, 2017 is the Monday (1) of the 40th week of the year 2017

Does anybody know how to do such a conversion?

kvantour
  • 25,269
  • 4
  • 47
  • 72
amber
  • 21
  • 3
  • 1
    Sorry, this is not the way StackOverflow works. Questions of the form "I want to do X, please give me tips and/or sample code" are considered off-topic. Please visit the [help] and read [ask], and especially read [Why is “Can someone help me?” not an actual question?](http://meta.stackoverflow.com/q/284236) – kvantour Jan 15 '19 at 10:55
  • Your question is very hard to answer as everything depends on the definition of Week number. [This post](https://stackoverflow.com/a/48420387/8344060) explains the differences between weeknumbers according to Europeans/Americans or Islamic culture. We would need a bit more information here. Do you have ISO week numbers? – kvantour Jan 15 '19 at 10:58
  • I must admit that this is not an easy one to compute. – kvantour Jan 15 '19 at 15:07
  • @kvantour wrt `this is not the way StackOverflow works` - apparently is is because she got an answer. If I were her I'd keep throwing up ambiguous, minimal effort questions as doing so appears to work just fine, – Ed Morton Jan 15 '19 at 17:17
  • @EdMorton It is true that I conveyed the obligatory message and still answered it. In contrast to a lot of other questions, this question is not one of those "how to add 3 columns together". The problem statement was clear and has an education factor which is much higher than a lot of other questions. The actual question, _Howto convert week number and weekday into an actual date_, is not an easy thing to do. Especially if you do not know where to start, even if you know awk or Perl` by heart. – kvantour Jan 16 '19 at 07:44
  • @EdMorton (cont) That is the main reason why I answered the question. I learned from it, other people might learn from it and other people might improve it. This is, what I believe, Stack Overflow is all about. – kvantour Jan 16 '19 at 07:45
  • @kvantour - I wonder if your answer is the best possible answer though? Since no-one else is providing answers, presumably because they agree with your statement that this isn't the way SO works, and the OP has accepted your answer rather than provide the missing information from her question to get other answers and from the comment under your answer, despite accepting it it sounds like the OP isn't going to use your answer anyway and is instead going to try to do something else and the question is now just 1 vote away from being closed I suppose we'll never know. Oh well... – Ed Morton Jan 16 '19 at 14:16
  • @amber if you ask a new question with the missing information then I expect you'll get a perl solution and since perl has an implementation of [strptime()](http://man7.org/linux/man-pages/man3/strptime.3.html) built in (sadly I couldn't persuade the gawk maintainers to [provide that for awk](https://lists.gnu.org/archive/html/bug-gawk/2014-08/msg00024.html)) it should be a trivial one-liner. – Ed Morton Jan 16 '19 at 14:32

1 Answers1

0

I will make the assumption that your week-number is according to the ISO 8601 definition (for other definitions see here). This ISO 8601 standard is widely used in the world: EU and most of other European countries, most of Asia, and Oceania

The ISO 8601 standard states the following:

  • There are 7 days in a week
  • The first day of the week is a Monday
  • The first week is the first week of the year which contains a Thursday. This means it is the first week with 4 days or more in January.

With this definition, it is possible to have a week number 53. These occur with the first of January is on a Friday (E.g. 2016-01-01, 2010-01-01). Or, if the year before was a leap year, also a Saturday. (E.g. 2005-01-01)

   December 2015               January 2016        
 Mo Tu We Th Fr Sa Su CW    Mo Tu We Th Fr Sa Su CW
     1  2  3  4  5  6 49                 1  2  3 53
  7  8  9 10 11 12 13 50     4  5  6  7  8  9 10 01
 14 15 16 17 18 19 20 51    11 12 13 14 15 16 17 02
 21 22 23 24 25 26 27 52    18 19 20 21 22 23 24 03
 28 29 30 31          53    25 26 27 28 29 30 31 04

Given the year, week_number and day_of_week, how can we reconstruct the date? The answer requires several steps and will compute the day of the year (doy) of the requested date.

To compute the day of the year doy we first need to figure out when the first-week starts as explained above. If Jan 01 is a Tuesday, then the first week only contains 6 days and not 7, while if Jan 01 is a Friday, the first week starts only the week after. So we can solve this by adding an offset. The offset can be found in the following table:

dow001 str: Mo Tu We Th Fr Sa Su
dow001 num: 01 02 03 04 05 06 07
offset    :  0 -1 -2 -3  3  2  1

and this offset is computed as 3-(dow001+2)%7

So with this, the day of the year is very easily computed:

doy = (week_number-1) * 7 + 3-(dow001+2)%7 + day_of_week

So having this, we can write the following GNU awk tool:

awk 'function compute_date(YYYY,CW,DOW) {
        dow001 = strftime("%u",mktime(YYYY " 01 01 00 00 00"))
        doy    =  (CW-1)*7 + (3 - (dow001+2)%7) + DOW
        return strftime("%Y%m%d",mktime(YYYY " 01 " doy " 00 00 00"))}
     }
     BEGIN { FS = OFS = "," }
     { datestr = compute_date(2000+$2,$3,$4) }
     { print $1, datestr , $5,$6,$7,$8,$9 }' file


740054,20171002,0000000000001,25,25,test1,1
740054,20171003,0000000000001,24,24,test2,1
740054,20171005,0000000000001,19,19,test3,1
kvantour
  • 25,269
  • 4
  • 47
  • 72
  • Thank you and apologies for the ambigous, minimal effort question. I must admit I think it is a bit complicated for an awk newbie like me. I may just need to find another solution to re write the files with the corresponding date. I have a separate file that contains the corresponding date of the 3 columns combined. – amber Jan 16 '19 at 01:15