3

I found a node.js solution but couldn't find a python one. I have a DataFrame that looks like:

   Year  Month      Week  numOfTrips
0  2011  July       30    2608
1  2011  August     31    6852
2  2011  August     32    8092
3  2011  August     33    7674
4  2011  August     34    7065
5  2011  August     35    3896
6  2011  September  35    4182
7  2011  September  36    7315
8  2011  September  37    8929
9  2011  September  38    7822
10 2011  September  39    6508
11 2011  October    39    1848
12 2011  October    40    9233
13 2011  October    41    7818
14 2011  October    42    7627
 .   .      .        .      .
 .   .      .        .      .
 .   .      .        .      .

I want to replace each week number by the week number of that month, how can I do this?

The result would look like:

   Year  Month      Week  numOfTrips
0  2011  July       5     2608
1  2011  August     1     6852
2  2011  August     2     8092
3  2011  August     3     7674
4  2011  August     4     7065
5  2011  August     5     3896
6  2011  September  1     4182
7  2011  September  2    7315
8  2011  September  3    8929
9  2011  September  4    7822
10 2011  September  5    6508
11 2011  October    1    1848
12 2011  October    2    9233
13 2011  October    3    7818
14 2011  October    4    7627
 .   .      .        .      .
 .   .      .        .      .
 .   .      .        .      .
Bn.F76
  • 783
  • 2
  • 12
  • 30
  • Turn your data into `date` objects and then check out this answer: https://stackoverflow.com/questions/3806473/python-week-number-of-the-month – rdas Apr 13 '19 at 16:52
  • @DroidX86 you mean `datetime`? As in `df['Week'] = pd.to_datetime(df.Week)` ? – Bn.F76 Apr 13 '19 at 17:01
  • Yup try that. best way to get date info is to use actual `datetime` objects. – rdas Apr 13 '19 at 17:02
  • For anyone who is still trying to figure this out (like me): Those week numbers are "The week ordinal of the year", returned by [Series.dt.week](https://pandas.pydata.org/pandas-docs/stable/reference/series.html) – Bn.F76 Apr 14 '19 at 19:32

2 Answers2

1

I found a relatively long and ugly solution that seems to work. If there are errors/bugs or a cleaner implementation, let me know.

My approach: using example 2011 July 30 2608

  1. Get the weeks for the appropriate month in its respective year
m = list(calendar.month_name).index('July') # turn month name into int
cal = calendar.Calendar()
weeks = cal.monthdatescalendar(2011,7) # get weeks for that month in the year
  1. Get every week for the year you are dealing with. In our example its 2011

yr = cal.yeardatescalendar(2011, width=12)

This code will fetch the year and store it in a 4D list: month-junk (in our case its 1 junk of 12 months), month, week, day. To query, do something like yr[0][6][4][:] for all the days on the 5th week of July. Because of this, the next step is to turn it into a 2D array which holds all the weeks.

flat = [week for month in yr[0] for week in month]
dates = np.array(flat)

This dates will hold duplicates due to the way cal.yeardatescalendar() returns the list. We remove these:

new_dates = []
for date in range(len(dates)):
    if not(np.array_equal(dates[date], np.array(dates[date-1]))):
        new_dates.append(dates[date])

At this point you have a 2D array of weeks which aligns with your week numbers. As in if you go here, and check week 30 for 2011, you'll see it's the 5th week of July.

  1. Next, we use the "correct" week-list array to query our week. Ex. week 30

the_week = new_dates[:][30]

  1. Finally, we see which week is our week in the month. When we get a match we know it's that week number.
for week in range(len(weeks)): 
    if np.array_equal(the_week, np.array(weeks[week])):
        save_week_num = week+1
        print(save_week_num) # for our example it will print 5

Here it is in a function:

def week_of_month(year, month, y_week):
    m = list(calendar.month_name).index(month) 
    cal = calendar.Calendar()
    weeks = cal.monthdatescalendar(year,m) 

    yr = cal.yeardatescalendar(year, width=12)
    flat = [week for month in yr[0] for week in month]
    dates = np.array(flat)

    new_dates = []
    for date in range(len(dates)):
        if not(np.array_equal(dates[date], np.array(dates[date-1]))):
            new_dates.append(dates[date])

    the_week = new_dates[:][y_week] # for 2013, replace this with: the_week = new_dates[:][y_week-1] 

    number = 0
    for week in range(len(weeks)):

        if np.array_equal(the_week, np.array(weeks[week])):

            save_week_num = week+1
            number = save_week_num

    return number

To test it on the DataFrame in the question run:

df['Week'] = df.apply(lambda row: week_of_month(row.Year, row.Month, row.Week), axis=1)

Bn.F76
  • 783
  • 2
  • 12
  • 30
0

We can do the following:

First we convert your Year + Week to datetime using pandas.to_datetime

from datetime import datetime as dt

df['Date'] = pd.to_datetime(df['Year'].astype(str) + '-' + df['Week'].astype(str) + '-1', format='%Y-%W-%w')

Then we define the function as mentioned in the comments from here:

from math import ceil

def week_of_month(dt):
    """ Returns the week of the month for the specified date.
    """

    first_day = dt.replace(day=1)

    dom = dt.day
    adjusted_dom = dom + first_day.weekday()

    return int(ceil(adjusted_dom/7.0))

Finally we apply the function to our Date column to get the Weeknumber of month:

df['Week'] = df['Date'].apply(week_of_month)

Which yields:

print(df)
   Year      Month  Week  numOfTrips       Date
0  2011       July     5        2608 2011-07-25
1  2011     August     1        6852 2011-08-01
2  2011     August     2        8092 2011-08-08
3  2011     August     3        7674 2011-08-15
4  2011     August     4        7065 2011-08-22
5  2011     August     5        3896 2011-08-29
6  2011  September     5        4182 2011-08-29

Note its not 100% accurate for the last row, because you dont have accurate dates. This would be 100% accurate if we had exact dates.

Erfan
  • 40,971
  • 8
  • 66
  • 78
  • What do you mean exact dates? Week 35 of Sep, 2011 cannot be the 5th week if Week 39 of September is also the 5th week... – Bn.F76 Apr 13 '19 at 17:38
  • We extracted the `Date` column from `Year` + `Week`. As you can see, our `Date` column can only be accurate on week level, not day level. Thats why it gives row 5 and row 6 the same date. – Erfan Apr 13 '19 at 17:44
  • Ok, I interpreted your comment as in I don't have enough data to make an accurate calculation. Using `Month` in the equation I should be able to get the correct week. Which is what trying now. – Bn.F76 Apr 13 '19 at 17:47
  • Did you look at my answer? Then you might understand better what I mean – Erfan Apr 13 '19 at 17:50