1

I have a pandas dataframe that contains a year and week column:

year     week
2018     18
2019     17
2019     17

I'm trying to combine the year and week columns into a new 'isoweek' column using the isoweek library. I can't seem to figure out how to properly loop through the rows to create the object column. If I do something like:

df['isoweek'] = Week(df['year'],df['week'])

isoweek chokes on the vectorization. I've tried creating a basic list and appending it to my dataframe, like so:

 obj_list = []

 for i in range(500):
     year = df['year'][i]
     week = df['week'][i]
     w = Week(year,week)
     obj_list.append(w)

 df['isoweek'] = obj_list

But I end up with a simple tuple in the column.

The goal is to be able to use some of the isoweek library's operations to calculate date differences, like:

 df['isoweek'] - 4 
 >isoweek.Week(2019, 34)

Is it even possible to store an object like this in a dataframe column? If so, how does one go about it?

NoobsterNoob
  • 125
  • 1
  • 2
  • 7

3 Answers3

1

Potentially you could do this

First, set up the example dataframe

from isoweek import Week
df = pd.DataFrame ({'year' : [2018,2019,2019], 
                    'week' : [18,17,17]})

Loop through the dataframe, adding the isoweek to a list

ls_isoweek = []
for row in df.itertuples(): 
    ls_isoweek.append(Week(row[1],row[2]))

The list looks like this

[isoweek.Week(2018, 18), isoweek.Week(2019, 17), isoweek.Week(2019, 17)]

This list can be accessed thusly

ls_isoweek[0] - 4

Produces this output

isoweek.Week(2018, 14)

However, the list can also be added back to the dataframe if you wish

df['isoweek'] = ls_isoweek

You can then do things like ...

df['isoweek_minus_4'] = df['isoweek'].apply(lambda x: x-4)

Producing an output like the below

enter image description here

the_good_pony
  • 490
  • 5
  • 12
1

As an alternative, you can use the built in method for datetime:

df['week_start'] = pd.to_datetime(df['year'].astype(str), format='%Y') + pd.to_timedelta(df['week'].mul(7).astype(str) + ' days')

# Output:
    week    year    week_start
0   18  2018    2018-05-07
1   17  2019    2019-04-30
2   17  2019    2019-04-30

Calculating time differences is pretty straightforward here:

# Choose 7 weeks
n_weeks = pd.to_timedelta(7, unit='W')

# Adding is simple
df['week_start'] + n_weeks

# Output
0   2018-06-25
1   2019-06-18
2   2019-06-18

For more on this, read: Pandas: How to create a datetime object from Week and Year?

Yaakov Bressler
  • 9,056
  • 2
  • 45
  • 69
1

A little late, but if anyone else is still looking to use a solution of this form as I was, you could use lambda functions along with apply. For the dataframe below (with int64 dtypes),

    year   week
0   2018    18
1   2019    17
2   2019    17

Now we use isoweek to appropriately parse the data,

from isoweek import Week
df.apply(lambda row : Week(row["year"],row["week"]),axis=1)

This produces the output,

0    (2018, 18)
1    (2019, 17)
2    (2019, 17)
dtype: object

You could also identify the (week,year) with a datetime object by combining this approach with this answer https://stackoverflow.com/a/7687085.

df.apply(lambda row : Week(int(row["year"]),int(row["week"])).monday(),axis=1)

The int appears a little redundant there, but pandas by default uses int64 which doesn't appear to function with isoweek correctly. This produces the output,

0    2018-04-30
1    2019-04-22
2    2019-04-22
dtype: object