0

I have a dataframe where the date column is in format format='%Y-W%W-%w'. I am converting from the 2018-W01 etc. to an actual date using pd.to_datetime(urldict[key]['date']+'-1', format='%Y-W%W-%w'), but the data appears to be shifted incorrectly for 2020/2021, I'm guessing because of the leap-year.

Subsequently, it creates two entries for 01-04-2021, with the first entry being what would be 2020-W53. The data going back is also misaligned.

I'm not sure how to fix this as I assumed that the datetime library would account for it.

Pre-conversion:

date    region  total
2020-W51    africa  1
2020-W52    africa  2
2020-W53    africa  3
2021-W01    africa  4

Post-conversion:

date    region  total
12/21/2020  africa  1
12/28/2020  africa  2
1/4/2021    africa  3
1/4/2021    africa  4
visualnotsobasic
  • 428
  • 3
  • 17
  • 3
    What do you mean by creating two entries? If you're just converting data, it should be one output for each input. Can you show sample code and the erroneous results? – Barmar Mar 09 '21 at 21:49
  • Hi @Barmar, sorry. There's really nothing going on in the code apart from the code in the question, but I have posted the dataframe before and after the code is exectued. – visualnotsobasic Mar 09 '21 at 22:16
  • What should the correct date be for week 53 in 2020? – Barmar Mar 09 '21 at 22:18
  • I misunderstood what you meant by "creates two entries". I thought it was turning one input into two outputs. – Barmar Mar 09 '21 at 22:19
  • It looks to me as though they should be all shifted back 1, i.e. Week 51 should begin Dec. 14th, Week 51 should begin Dec. 21, Week 53 should begin Dec. 28th, however there are multiple years in the dataframe and the beginning of 2020 is aligned which is what makes me think it is the leap year. – visualnotsobasic Mar 09 '21 at 22:20
  • The basic problem is that a year isn't an exact number of weeks. It's 52 weeks + 1 or 2 days, depending on whether it's a leap year – Barmar Mar 09 '21 at 22:22
  • I don't think the problem has anything to do with leap years. Week 53 of year N is the same as week 1 of year N+1. – Barmar Mar 09 '21 at 22:23
  • @Barmar so what would you suggest instead? – visualnotsobasic Mar 09 '21 at 22:32
  • What result are you expecting? Why do you have two entries in the df for the same week? Garbage in, garbage out. – Barmar Mar 10 '21 at 00:16
  • @Barmar they're two separate weeks, how is that garbage in...? – visualnotsobasic Mar 10 '21 at 00:38
  • A year only has 52 full weeks. The 53rd week of a year is really the first week of the next year. – Barmar Mar 10 '21 at 00:41
  • yes, but as you mentioned some of the days are in the previous year. this is how the data is provided from the service, this is not by choice. do you know a method of handling it? – visualnotsobasic Mar 10 '21 at 00:52
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/229706/discussion-between-barmar-and-visualnotsobasic). – Barmar Mar 10 '21 at 00:59
  • Thank you @Barmar, not sure I know how to/can initiate a message – visualnotsobasic Mar 10 '21 at 01:07
  • I've been waiting in the chat room for you to join by clicking on that link. – Barmar Mar 10 '21 at 01:08

1 Answers1

1

It seems you need ISO 8601 year/week/weekday, so the correct formatting directive would be '%G-W%V-%u' (see the docs, end of that section). For

df

date  region  total
0  2020-W51  africa      1
1  2020-W52  africa      2
2  2020-W53  africa      3
3  2021-W01  africa      4

that would look like

pd.to_datetime(df['date']+'-1', format='%G-W%V-%w')

0   2020-12-14
1   2020-12-21
2   2020-12-28
3   2021-01-04
Name: date, dtype: datetime64[ns]

Related: Python - Get date from day of week, year, and week number

FObersteiner
  • 22,500
  • 8
  • 42
  • 72