While preprocessing data & feature engineering for a ML algorithm, the date and time are very important features for my problem, so that I wanted to encode them as cyclical variables. I have about 6k-7k datapoints in the dataset, containing day ranges from 1-31, month ranges 1-12, hour 00-23, minute 00-59. So I divided the timestamp in year, month, day, hour, and minute columns.
I further encoded month, day, hour and minute as described here https://ianlondon.github.io/blog/encoding-cyclical-features-24hour-time/ so the algorithm "gets the point" that for ex. 01 November and 31 October are (in term of days) nearer to each other than 25 October and 31 October.
Here is the code with which I transformed the components:
# Transform the cyclical features
cyclic_df['min_sin'] = np.sin(cyclic_df.minute*(2.*np.pi/59)) # Sinus component of minute
cyclic_df['min_cos'] = np.cos(cyclic_df.minute*(2.*np.pi/59)) # Cosinus component of minute
cyclic_df['hr_sin'] = np.sin(cyclic_df.hour*(2.*np.pi/23)) # Sinus component of hour
cyclic_df['hr_cos'] = np.cos(cyclic_df.hour*(2.*np.pi/23)) # Cosinus component of hour
cyclic_df['d_sin'] = np.sin(cyclic_df.day*(2.*np.pi/30)) # Sinus component of day
cyclic_df['d_cos'] = np.cos(cyclic_df.day*(2.*np.pi/30)) # Cosinus component of day
cyclic_df['mnth_sin'] = np.sin((cyclic_df.month-1)*(2.*np.pi/12)) # Sinus component of minute
cyclic_df['mnth_cos'] = np.cos((cyclic_df.month-1)*(2.*np.pi/12)) # Cosinus component of minute
# We drop the useless features, because we don't need them anymore, as for this model we extracted from it all features that we need.
cyclic_df.drop(['minute', 'hour', 'day', 'month'], axis=1, inplace=True)
Now when I plot the transformed components, here is what I get:
My 3 questions: 1) How can I add the hour (00-23), month (1-12), day (1-31) number on the plot? 2) How can I change the fontsize of the title of each subplot? And how to reduce the margin between the suptitle and the subplots? It's huge! 3) Can I use Seaborn to make the same plot above, so that the plot looks better, and with better colors palette?
Here is the code I used to plot:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12,12))
fig.suptitle("Representation Of Cyclical Features", fontsize=16)
cyclic_df.sample(6000).plot.scatter('d_cos','d_sin', title='Cyclical Days Transformation',
ax=axes[0,0]).set_aspect('equal')
cyclic_df.sample(6000).plot.scatter('mnth_cos','mnth_sin', title='Cyclical Months Transformation', ax=axes[0,1]).set_aspect('equal')
cyclic_df.sample(6000).plot.scatter('hr_cos','hr_sin', title='Cyclical Hours Transformation', ax=axes[1,0]).set_aspect('equal')
cyclic_df.sample(6000).plot.scatter('min_cos','min_sin',title='Cyclical Minutes Transformation', ax=axes[1,1]).set_aspect('equal')