1

I have this data:

Time = ['2017-03-13 00:01:00', '2017-03-13 00:02:00', '2017-03-13 23:59:00']
Speed = [20, 40.5, 100]
Kilometer = [1.4, 2.0, 4.1]   
N130317 = pd.DataFrame({'Time':Time, 'Speed':Speed, 'Kilometer':Kilometer})

I have converted the time using:

N130317['Time'] = pd.to_datetime(N130317['Time'], format = '%Y-%m-%d %H:%M:%S')
N130317['Time'] = pd.to_datetime(N130317['Time'], format).apply(lambda x: x.time())
N130317['Time'] = N130317['Time'].map(lambda t: t.strftime('%H:%M'))

I have made a plot using:

marker_size=1 #sets size of dots
cm = plt.cm.get_cmap('plasma_r') #sets colour scheme 
plt.scatter(N130317['Kilometer'], N130317['Time'], marker_size, c=N130317['Speed'], cmap=cm)
plt.title("NDW 13-03-17")
plt.xlabel("Kilometer")
plt.ylabel("Time")
plt.colorbar().set_label("Speed", labelpad=+1) #Makes a legend
plt.show()

But the graph shows up like this (all the time stamps show up on the y axis which evidently doesn't have the space for them - there is a time stamp for every minute in my date):

Picture

What can I do to solve this? Any help would be greatly appreciated. I have tried so many things online.

Zephyr
  • 11,891
  • 53
  • 45
  • 80
nielsen
  • 383
  • 1
  • 6

1 Answers1

1

I used these lines to create some data, replace them with your data:

from itertools import product

Time = [f'2017-03-13 {H}:{M}:{S}' for H, M, S in list(product([('0' + str(x))[-2:] for x in range(0, 24)],
                                                              [('0' + str(x))[-2:] for x in range(0, 60)],
                                                              [('0' + str(x))[-2:] for x in range(0, 60)]))]
Speed = list(130*np.random.rand(len(Time)))
Kilometer = list(50*np.random.rand(len(Time)))
N130317 = pd.DataFrame({'Time':Time, 'Speed':Speed, 'Kilometer':Kilometer})

I converted the N130317['Time'] to timestamp with this line:

N130317['Time'] = pd.to_datetime(N130317['Time'], format = '%Y-%m-%d %H:%M:%S')

Then I set the yaxis format property to date:

import matplotlib.dates as md

ax=plt.gca()
xfmt = md.DateFormatter('%H:%M')
ax.yaxis.set_major_formatter(xfmt)

The whole code is:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as md
from itertools import product

Time = [f'2017-03-13 {H}:{M}:{S}' for H, M, S in list(product([('0' + str(x))[-2:] for x in range(0, 24)],
                                                              [('0' + str(x))[-2:] for x in range(0, 60)],
                                                              [('0' + str(x))[-2:] for x in range(0, 60)]))]
Speed = list(130*np.random.rand(len(Time)))
Kilometer = list(50*np.random.rand(len(Time)))
N130317 = pd.DataFrame({'Time':Time, 'Speed':Speed, 'Kilometer':Kilometer})

N130317['Time'] = pd.to_datetime(N130317['Time'], format = '%Y-%m-%d %H:%M:%S')

marker_size = 1  # sets size of dots
cm = plt.cm.get_cmap('plasma_r') #sets colour scheme
plt.scatter(N130317['Kilometer'], N130317['Time'], marker_size, c=N130317['Speed'], cmap=cm)
ax=plt.gca()
xfmt = md.DateFormatter('%H:%M')
ax.yaxis.set_major_formatter(xfmt)
plt.title("NDW 13-03-17")
plt.xlabel("Kilometer")
plt.ylabel("Time")
plt.colorbar().set_label("Speed", labelpad=+1) #Makes a legend
plt.show()

and it gives me this plot:

enter image description here


Please, note that the pd.to_datetime() has to be applied to a datetime object, not to a string. If you run this code:

hour = '2017-03-13 00:00:00'
pd.to_datetime(hour, format = '%H:%M')

You will get this error message:

ValueError: time data '2017-03-13 00:00:00' does not match format '%H:%M' (match)

So you need to use this code, in order to convert the string to a datetime:

hour = '2017-03-13 00:00:00'
hour = datetime.strptime(hour, '%Y-%m-%d %H:%M:%S')
pd.to_datetime(hour, format = '%H:%M')

This depends on the data type you have, I did not encounter this issue since I re-created the data as wrote above.

Version info

Python      3.7.0
matplotlib  3.2.1
numpy       1.18.4
pandas      1.0.4
Zephyr
  • 11,891
  • 53
  • 45
  • 80
  • You are a magician. Do I need the other time formatting lines (the 3 before N130317['Time'] = pd.to_datetime(N130317['Time'], format = '%H:%M'))? – nielsen Jun 09 '20 at 19:34
  • No, you don't: the line `N130317['Time'] = pd.to_datetime(N130317['Time'], format = '%H:%M')` alone is what you need, now I update the answer – Zephyr Jun 09 '20 at 19:40
  • Yes of course, I'm still trying it out. I get an error: "time data '2017-03-13 00:00:00' does not match format '%H:%M' (match)" – nielsen Jun 09 '20 at 19:43
  • That worked, but continuing with the graph, makes a graph that looks like this https://www.imageupload.net/image/c937p Opening the dataframe, it has all time stamps from 00:00 to 23.59, but the the date is still there despite changing the format. – nielsen Jun 09 '20 at 20:05
  • Only this line will make the column into %H:%M format: N130317['Time'] = N130317['Time'].map(lambda t: t.strftime('%H:%M')) But using this gives the error when i make the graph: "DateFormatter found a value of x=0, which is an illegal date; this usually occurs because you have not informed the axis that it is plotting dates, e.g., with ax.xaxis_date()" @Andrea Blengino – nielsen Jun 09 '20 at 20:14
  • Could you update the question with the code you are using now? And also a subset of your source data is surely useful – Zephyr Jun 09 '20 at 20:15
  • I just tried using your example, and I have the same problem there – nielsen Jun 09 '20 at 20:20
  • Yes exactly what I did. I am so sorry about this. Yes it looks like I need to update pandas, matplotlib and numpy. I checked with .dtypes and it has been converted to a datetime format. I want to ask. When you look at your dataframe in the time column, does it also show the date despite using N130317['Time'] = pd.to_datetime(N130317['Time'], format = '%H:%M')? – nielsen Jun 09 '20 at 20:47
  • Maybe I found the misunderstanding, I re-updated the whole code above. Please, re-try now with the code you find above – Zephyr Jun 09 '20 at 20:57
  • I still get a plot where the data is only in the middle of the image and the ticks are all 00:00 – nielsen Jun 09 '20 at 21:08
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/215618/discussion-between-andrea-blengino-and-nielsen). – Zephyr Jun 09 '20 at 21:14
  • 1
    Solved the problem by adding the line plt.ylim(N130317['Time'][0], N130317['Time'][N130317.index[-1]]) – nielsen Jun 11 '20 at 13:03