3

I'm working with a dataset that only contains datetime objects and I have retrieved the day of the week and reformatted the time in a separate column like this (conversion functions included below):

    datetime            day_of_week time_of_day
0   2021-06-13 12:56:16 Sunday      20:00:00
5   2021-06-13 12:56:54 Sunday      20:00:00
6   2021-06-13 12:57:27 Sunday      20:00:00
7   2021-07-16 18:55:42 Friday      20:00:00
8   2021-07-16 18:56:03 Friday      20:00:00
9   2021-06-04 18:42:06 Friday      20:00:00
10  2021-06-04 18:49:05 Friday      20:00:00
11  2021-06-04 18:58:22 Friday      20:00:00

What I would like to do is create a kde plot with x-axis = time_of_day (spanning 00:00:00 to 23:59:59), y-axis to be the count of each day_of_week at each hour of the day, and hue = day_of_week. In essence, I'd have seven different distributions representing occurrences during each day of the week.

Here's a sample of the data and my code. Any help would be appreciated:

df = pd.DataFrame([
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:16',
    '2021-06-13 12:56:54',
    '2021-06-13 12:56:54',
    '2021-06-13 12:57:27',
    '2021-07-16 18:55:42',
    '2021-07-16 18:56:03',
    '2021-06-04 18:42:06',
    '2021-06-04 18:49:05',
    '2021-06-04 18:58:22',
    '2021-06-08 21:31:44',
    '2021-06-09 02:14:30',
    '2021-06-09 02:20:19',
    '2021-06-12 18:05:47',
    '2021-06-15 23:46:41',
    '2021-06-15 23:47:18',
    '2021-06-16 14:19:08',
    '2021-06-17 19:08:17',
    '2021-06-17 22:37:27',
    '2021-06-21 23:31:32',
    '2021-06-23 20:32:09',
    '2021-06-24 16:04:21',
    '2020-05-22 18:29:02',
    '2020-05-22 18:29:02',
    '2020-05-22 18:29:02',
    '2020-05-22 18:29:02',
    '2020-08-31 21:38:07',
    '2020-08-31 21:38:22',
    '2020-08-31 21:38:42',
    '2020-08-31 21:39:03',
], columns=['datetime'])

def convert_date(date):
    return calendar.day_name[date.weekday()]

def convert_hour(time):
    return time[:2]+':00:00'

df['day_of_week'] = pd.to_datetime(df['datetime']).apply(convert_date)
df['time_of_day'] = df['datetime'].astype(str).apply(convert_hour)
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
big_cactus
  • 69
  • 1
  • 6

3 Answers3

3

Let's try:

  1. converting the datetime column to_datetime
  2. Create a Categorical column from day_of_week codes (so categorical ordering functions correctly)
  3. normalizing the time_of_day to a single day (so comparisons function correctly). This makes it seem like all events occurred within the same day making plotting logic much simpler.
  4. plot the kdeplot
  5. set the xaxis formatter to only display HH:MM:SS
import calendar

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt, dates as mdates


# df = pd.DataFrame({...})

# Convert to datetime
df['datetime'] = pd.to_datetime(df['datetime'])
# Create Categorical Column
cat_type = pd.CategoricalDtype(list(calendar.day_name), ordered=True)
df['day_of_week'] = pd.Categorical.from_codes(
    df['datetime'].dt.day_of_week, dtype=cat_type
)
# Create Normalized Date Column
df['time_of_day'] = pd.to_datetime('2000-01-01 ' +
                                   df['datetime'].dt.time.astype(str))

# Plot
ax = sns.kdeplot(data=df, x='time_of_day', hue='day_of_week')

# X axis format
ax.set_xlim([pd.to_datetime('2000-01-01 00:00:00'),
             pd.to_datetime('2000-01-01 23:59:59')])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))

plt.tight_layout()
plt.show()

Note sample size is small here: kdeplot

If looking for count on y then maybe histplot is better:

ax = sns.histplot(data=df, x='time_of_day', hue='day_of_week')

histplot

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • Thanks big time! Would you also happen to know the `return` equivalent for `plt.show()`? I'm returning this plot in a function and `plt.show()` is giving me trouble. – big_cactus Jul 20 '21 at 23:11
0

I would use Timestamp of pandas straight away. By the way your convert_hour function seems to do wrong. It gives time_of_the day as 20:00:00 for all data.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt



sns.set_context("paper", font_scale=2)
sns.set_style('whitegrid')

df['day_of_week'] = df['datetime'].apply(lambda x: pd.Timestamp(x).day_name())
df['time_of_day'] = df['datetime'].apply(lambda x: pd.Timestamp(x).hour)

plt.figure(figsize=(8, 4))

for idx, day in enumerate(days):
    sns.kdeplot(df[df.day_of_week == day]['time_of_day'], label=day)

kdeplot

The kde for wednesday, looks a bit strange because the time varies between 2 and 20, hence the long tail from -20 to 40 in the plot.

cmbfast
  • 489
  • 4
  • 9
0

Here is a simple code and using df.plot.kde.

Added more data so that multiple values are present for each day_of_week for kde to plot. Simplified the code to remove functions.

df1 = pd.DataFrame([
    '2020-09-01 16:39:03',
    '2020-09-02 16:39:03',
    '2020-09-03 16:39:03',
    '2020-09-04 16:39:03',
    '2020-09-05 16:39:03',
    '2020-09-06 16:39:03',
    '2020-09-07 16:39:03',
    '2020-09-08 16:39:03',
], columns=['datetime'])
df = pd.concat([df,df1]).reset_index(drop=True)
df['day_of_week'] = pd.to_datetime(df['datetime']).dt.day_name()
df['time_of_day'] = df['datetime'].str.split(expand=True)[1].str.split(':',expand=True)[0].astype(int)
df.pivot(columns='day_of_week').time_of_day.plot.kde()

Plots: enter image description here

sharathnatraj
  • 1,614
  • 5
  • 14