Trying to encode cyclical features for a ML algorithm, where the timestamp feature is very important as feature.
I want to transform the day_in_month ('day' column of cyclic_df) into a cyclical variable, so that the 1st of a month is after the last day of a the previous. So 01. February (01.02) is nearer to 31 January (31.01) and thus the difference between the 2 days, if you consider just the day column, is 1 and not 30!
# Transform the cyclical features
cyclic_df['min_sin'] = np.sin(cyclic_df.minute*(2.*np.pi/59)) # Sinus component of minute
cyclic_df['min_cos'] = np.cos(cyclic_df.minute*(2.*np.pi/59)) # Cosinus component of minute
cyclic_df['hr_sin'] = np.sin(cyclic_df.hour*(2.*np.pi/23)) # Sinus component of hour
cyclic_df['hr_cos'] = np.cos(cyclic_df.hour*(2.*np.pi/23)) # Cosinus component of hour
cyclic_df['d_sin'] = np.sin(cyclic_df.day*(2.*np.pi/30)) # !!!Sinus component of day!!!! Help here
cyclic_df['d_cos'] = np.cos(cyclic_df.day*(2.*np.pi/30)) # !!!Cosinus component of day!!! Help here
cyclic_df['mnth_sin'] = np.sin((cyclic_df.month-1)*(2.*np.pi/12)) # Sinus component of minute
cyclic_df['mnth_cos'] = np.cos((cyclic_df.month-1)*(2.*np.pi/12)) # Cosinus component of minute
The problem is with that 30 with which I divide. Not every month has 30 days, there are months with 30, 31, 28 or 29 days. In each row in cyclical_df, I have a column 'month', a column 'year', and a column 'day'. So theoritically, there should be a solution to read the right number of days for that given month. How can I replace that 30 (line 5 & line 6 in code above), with the right variables, so it reads from the other columns the year and month, and replaces with the right value, and not always 30?
PS: It would be very nice, if someone could tell me, if I am doing right for the minute, hour and month, also available in the code above.
EDIT (after comments): Yes, I have a 'year' column. And changing the two line to:
cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))
cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))
I get following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-575-532a308075e2> in <module>()
11 #cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/30)) # Cosinus component of day
12
---> 13 cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))
14 cyclic_ext_df['d_cos'] = np.cos(cyclic_ext_df.day*(2.*np.pi/monthrange(cyclic_df.year, cyclic_ext_df.month)[1]))
15
~/anaconda/lib/python3.6/calendar.py in monthrange(year, month)
120 """Return weekday (0-6 ~ Mon-Sun) and number of days (28-31) for
121 year, month."""
--> 122 if not 1 <= month <= 12:
123 raise IllegalMonthError(month)
124 day1 = weekday(year, month, 1)
~/anaconda/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
1574 raise ValueError("The truth value of a {0} is ambiguous. "
1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1576 .format(self.__class__.__name__))
1577
1578 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().