0

I've got a dataset that looks a bit like this.

df

headline        some_url time                is_national
0   Holloway    url      2023-01-11 11:44:27 True
1   London      url      2023-01-11 11:25:10 False
2   Viral       url      2023-01-11 10:43:39 False
3   London      url      2023-01-11 09:41:18 True
4   Royal       url      2023-01-11 15:49:38 False

I've been able to create a categorical column for day of the week thus:

cats = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
df['day_of_week'] = df.time.dt.day_name()
df['day_of_week'] = pd.Categorical(df['day_of_week'], categories=cats, ordered=True)

I planned to create an hour column like this:

df['hour'] = df.time.dt.hour

But the hour column comes out as a floating point.

The result when plotted is:

enter image description here

How do I avoid the floating point?

The second query is two-fold. I can produce a histogram of each using the .plot(kind=hist) function in pandas like so:

enter image description here

But the KDE plot with this query:

ax = df.hour.plot(kind='kde', title="Articles by hour")
ax.set_xlabel("Hour")
ax.set_ylabel("Number of articles")

Which looks like this:

enter image description here

Is there a simple way of cropping the plot to avoid minus hours or hours beyond 24?

elksie5000
  • 7,084
  • 12
  • 57
  • 87

0 Answers0