I want a scatter plot duration(mins)
versus start
time like this (which is a time of day, irrespective of what date it was on):
I have a CSV file commute.csv
which looks like this:
date, prediction, start, stop, duration, duration(mins), Day of week
14/08/2015, , 08:02:00, 08:22:00, 00:20:00, 20, Fri
25/08/2015, , 18:16:00, 18:27:00, 00:11:00, 11, Tue
26/08/2015, , 08:26:00, 08:46:00, 00:20:00, 20, Wed
26/08/2015, , 18:28:00, 18:46:00, 00:18:00, 18, Wed
The full CSV file is here.
I can import the CSV file like so:
import pandas as pd
times = pd.read_csv('commute.csv', parse_dates=[[0, 2], [0, 3]], dayfirst=True)
times.head()
Out:
date_start date_stop prediction duration duration(mins) Day of week
0 2015-08-14 08:02:00 2015-08-14 08:22:00 NaN 00:20:00 20 Fri
1 2015-08-25 18:16:00 2015-08-25 18:27:00 NaN 00:11:00 11 Tue
2 2015-08-26 08:26:00 2015-08-26 08:46:00 NaN 00:20:00 20 Wed
3 2015-08-26 18:28:00 2015-08-26 18:46:00 NaN 00:18:00 18 Wed
4 2015-08-28 08:37:00 2015-08-28 08:52:00 NaN 00:15:00 15 Fri
I am now struggling to plot duration(mins)
versus start
time (without the date). Please help!
@jezrael has been a great help... one of the comments on issue 8113 proposes using a variant of df.plot(x=x, y=y, style="."). I tried it:
times.plot(x='start', y='duration(mins)', style='.')
However, it doesn't show the same as my intended plot: the output is incorrect because the X axis has been stretched so that each data point is the same distance apart in X:
Is there no way to plot against time?