0

Hi I am working on a categorical data. I want to see device behavior on a given day. I have these as my dataframe:

On toronto_time, I have a datetime64[D]. I previously used dt.time to remove the date. However, it presents a datatype problem which makes it a type object and not datetime64[D]. Converting it again with pd.to_datetime will add a date on itself.

So I left it with the original:

       toronto_time             description
0      2018-09-08 00:00:50      STATS
1      2018-09-08 00:01:55      STATS
2      2018-09-08 00:02:18      DEV_OL
3      2018-09-08 00:05:24      STATS
4      2018-09-08 00:05:34      STATS
5      2018-09-08 00:06:33      CMD_ERROR

I tried plotting it with seaborn with these codes:

import matplotlib.pyplot as plt
import matplotlib.dates as md
import seaborn as sns

plt.style.use('seaborn-colorblind')
plt.figure(figsize=(8,6))
sns.swarmplot('toronto_time', 'description', data=df);
plt.show()

However the visualization is compressed on that day. I want to remove the day in the xlabel also stretch them according to hours (0:00 to 24:00)

This is what I got: enter image description here

Nikko
  • 1,410
  • 1
  • 22
  • 49
  • If you make a transformation of your dataframe like this `df['toronto_time'] = df['toronto_time'].dt.hour`, doesn't it provide the desired output? – Teoretic Sep 12 '18 at 09:30
  • It will just stay with the hour. Categories will point to specific hour. – Nikko Sep 12 '18 at 09:39
  • 1
    I'm afraid I don't quite understand you... See my answer and comment on it if it doesn't produce your desired output – Teoretic Sep 12 '18 at 09:46

1 Answers1

3

I'm not sure why you want the minutes and seconds on the graph if your ticks are only on the hour? But you can do it by setting a formatter for your axis. Although I would suggest also changing you axis limits if you're looking for ticks by the hour.

import pandas as pd

import matplotlib.pyplot as plt
import matplotlib.dates as md
import seaborn as sns

df = pd.DataFrame({'toronto_time': ['2018-09-08 00:00:50',
                                    '2018-09-08 01:01:55',
                                    '2018-09-08 02:02:18',
                                    '2018-09-08 03:05:24',
                                    '2018-09-08 04:05:34',
                                    '2018-09-08 05:06:33'], 
                    'description': ['STATS', 'STATS', 'DEV_OL', 'STATS', 'STATS', 'CMD_ERROR']})
df['toronto_time'] = pd.to_datetime(df['toronto_time'], format='%Y-%m-%d %H:%M:%S')

plt.style.use('seaborn-colorblind')
fig, ax = plt.subplots(figsize=(8,6))
sns.swarmplot('toronto_time', 'description', data=df, ax=ax)
ax.set_xlim(df['toronto_time'].min()-pd.Timedelta(1,'h'),
            df['toronto_time'].max()+pd.Timedelta(1,'h'))
ax.xaxis.set_major_formatter(md.DateFormatter('%H:%M:%S'))

plt.show()

enter image description here

Here's a nice example showing how to use a locator to define how the ticks are spaced as well: http://leancrew.com/all-this/2015/01/labeling-time-series/

Dan
  • 45,079
  • 17
  • 88
  • 157
  • Nice! Thanks! Although my xticks shows (0, 3, 6, 9, 12 ... 21:00, 0:00). How can I display all hours? (0:00 to 23:59 or 0) – Nikko Sep 13 '18 at 07:52
  • 1
    It's in the link at the bottom of the answer: try `ax.xaxis.set_major_locator(md.HoursLocator())` – Dan Sep 13 '18 at 09:12