0

While preprocessing data & feature engineering for a ML algorithm, the date and time are very important features for my problem, so that I wanted to encode them as cyclical variables. I have about 6k-7k datapoints in the dataset, containing day ranges from 1-31, month ranges 1-12, hour 00-23, minute 00-59. So I divided the timestamp in year, month, day, hour, and minute columns.

I further encoded month, day, hour and minute as described here https://ianlondon.github.io/blog/encoding-cyclical-features-24hour-time/ so the algorithm "gets the point" that for ex. 01 November and 31 October are (in term of days) nearer to each other than 25 October and 31 October.

Here is the code with which I transformed the components:

# Transform the cyclical features 
cyclic_df['min_sin'] = np.sin(cyclic_df.minute*(2.*np.pi/59))       # Sinus component of minute 
cyclic_df['min_cos'] = np.cos(cyclic_df.minute*(2.*np.pi/59))       # Cosinus component of minute 
cyclic_df['hr_sin'] = np.sin(cyclic_df.hour*(2.*np.pi/23))          # Sinus component of hour 
cyclic_df['hr_cos'] = np.cos(cyclic_df.hour*(2.*np.pi/23))          # Cosinus component of hour
cyclic_df['d_sin'] = np.sin(cyclic_df.day*(2.*np.pi/30))            # Sinus component of day 
cyclic_df['d_cos'] = np.cos(cyclic_df.day*(2.*np.pi/30))            # Cosinus component of day
cyclic_df['mnth_sin'] = np.sin((cyclic_df.month-1)*(2.*np.pi/12))   # Sinus component of minute 
cyclic_df['mnth_cos'] = np.cos((cyclic_df.month-1)*(2.*np.pi/12))   # Cosinus component of minute 

# We drop the useless features, because we don't need them anymore, as for this model we extracted from it all features that we need. 
cyclic_df.drop(['minute', 'hour', 'day', 'month'], axis=1, inplace=True)

Now when I plot the transformed components, here is what I get: plot of new cyclical features: month, day, hour, minute

My 3 questions: 1) How can I add the hour (00-23), month (1-12), day (1-31) number on the plot? 2) How can I change the fontsize of the title of each subplot? And how to reduce the margin between the suptitle and the subplots? It's huge! 3) Can I use Seaborn to make the same plot above, so that the plot looks better, and with better colors palette?

Here is the code I used to plot:

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(12,12))

fig.suptitle("Representation Of Cyclical Features", fontsize=16)

cyclic_df.sample(6000).plot.scatter('d_cos','d_sin', title='Cyclical Days Transformation', 
                                   ax=axes[0,0]).set_aspect('equal')
cyclic_df.sample(6000).plot.scatter('mnth_cos','mnth_sin', title='Cyclical Months Transformation', ax=axes[0,1]).set_aspect('equal')
cyclic_df.sample(6000).plot.scatter('hr_cos','hr_sin', title='Cyclical Hours Transformation', ax=axes[1,0]).set_aspect('equal')
cyclic_df.sample(6000).plot.scatter('min_cos','min_sin',title='Cyclical Minutes Transformation', ax=axes[1,1]).set_aspect('equal')
ZelelB
  • 1,836
  • 7
  • 45
  • 71
  • 1
    (1) I don't understand (2) Seems unrelated to the code and has surely been answered already. (3) "Use searborn" to make something "look better" is a strange approach. Seaborn uses certain matplotlib style features and you may sure use the same - but "better" is too subjective to know what exactly you need. – ImportanceOfBeingErnest Dec 07 '18 at 12:40
  • if you read all of what I've written above, you will understand (1) very well. Basically that's THE question. Let's say for the day component, how to add on the plot, around the circle, the day (day in month 0 to 31) number? Or for the hour component, add the hour (hour in day from 0 to 23) number, so the plot for the hour component looks almost like a clock. – ZelelB Dec 07 '18 at 12:42
  • 1
    That comment was meant for you to realize that a person that has read everything may still have problems understanding the question. You may take that serious and edit the question... or not. Your choice. – ImportanceOfBeingErnest Dec 07 '18 at 12:44
  • edited the comment.. Let me know if you need more explanation. – ZelelB Dec 07 '18 at 12:45
  • edited the question as well, so the code to the plots is there, and (2) would be now related to the code :) – ZelelB Dec 07 '18 at 12:47
  • Do you want to replace the axis ticklabels (those that currently read `-1`...`1`? Do you want to add extra labels aranged circularly? Do you maybe want a circular (polar-) plot after all? – ImportanceOfBeingErnest Dec 07 '18 at 12:48
  • I would like to add extra labels aranged circularly, exactly! – ZelelB Dec 07 '18 at 13:38
  • 2
    Like [this](https://stackoverflow.com/questions/14432557/matplotlib-scatter-plot-with-different-text-at-each-data-point)? – ImportanceOfBeingErnest Dec 07 '18 at 13:40
  • Yes! But how to apply it on my case? And it mustn't be every point on the circle. The most significant places. For ex every circle quarter point (approximately), is noted with it's value. For the hour component, this would be the 00h, 06h, 12h, 18h – ZelelB Dec 07 '18 at 13:54
  • 1
    So, e.g. 6 o'clock would be `(cos(-6/24*2*np.pi+np.pi/2), np.sin(-6/24*2*np.pi+np.pi/2))`, right? – ImportanceOfBeingErnest Dec 07 '18 at 14:32
  • yes, you're right – ZelelB Dec 07 '18 at 14:51

0 Answers0