I've got a dataset that looks a bit like this.
df
headline some_url time is_national
0 Holloway url 2023-01-11 11:44:27 True
1 London url 2023-01-11 11:25:10 False
2 Viral url 2023-01-11 10:43:39 False
3 London url 2023-01-11 09:41:18 True
4 Royal url 2023-01-11 15:49:38 False
I've been able to create a categorical column for day of the week thus:
cats = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df['day_of_week'] = df.time.dt.day_name()
df['day_of_week'] = pd.Categorical(df['day_of_week'], categories=cats, ordered=True)
I planned to create an hour column like this:
df['hour'] = df.time.dt.hour
But the hour column comes out as a floating point.
The result when plotted is:
How do I avoid the floating point?
The second query is two-fold. I can produce a histogram of each using the .plot(kind=hist) function in pandas like so:
But the KDE plot with this query:
ax = df.hour.plot(kind='kde', title="Articles by hour")
ax.set_xlabel("Hour")
ax.set_ylabel("Number of articles")
Which looks like this:
Is there a simple way of cropping the plot to avoid minus hours or hours beyond 24?