36

I have a dataframe with two columns of datetime.time's. I'd like to scatter plot them. I'd also like the axes to display the times, ideally. But

df.plot(kind='scatter', x='T1', y='T2')

dumps a bunch of internal plot errors ending with a KeyError on 'T1'.

Alternatively, I try

plt.plot_date(x=df.loc[:,'T1'], y=df.loc[:,'T2'])
plt.show()

and I get 'Exception in Tkinter callback' with a long stack crawl ending in

return _from_ordinalf(x, tz)
  File "/usr/lib/python3/dist-packages/matplotlib/dates.py", line 224, in _from_ordinalf
microsecond, tzinfo=UTC).astimezone(tz)
TypeError: tzinfo argument must be None or of a tzinfo subclass, not type 'str'

Any pointers?

jma
  • 3,580
  • 6
  • 40
  • 60
  • Since you didn't specify a tz argument, I'm guessing its trying to parse it out of your datetime. Just a guess. Can you post an example of your datetime format? – Bob Haffner Dec 14 '14 at 18:53
  • These are datetime.time's, so TZ doesn't really make sense to me. `df.loc[:,'T1'].values[0] ==> datetime.time(0, 15, 43)` – jma Dec 14 '14 at 19:23
  • Agreed. Sorry, not much help – Bob Haffner Dec 14 '14 at 20:21
  • 1
    How about a [minimal example](http://stackoverflow.com/help/mcve) to recreate the error? – hitzg Dec 16 '14 at 17:06

5 Answers5

44

Not a real answer but a workaround, as suggested by Tom Augspurger, is that you can just use the working line plot type and specify dots instead of lines:

df.plot(x='x', y='y', style=".")
Kartoch
  • 7,610
  • 9
  • 40
  • 68
Aaron Schumacher
  • 3,695
  • 2
  • 23
  • 23
  • 2
    But the figure produced in this way and the scatter plot are not the same. – ZillGate Dec 10 '15 at 20:29
  • 1
    To elaborate on @ZillGate comment - in this case, the x axis is just the list of "x" values. They are not necessarily in order, and they are not spaced appropriately (unless your x axis values are evenly spaced to begin with). – adam.r Aug 07 '18 at 12:49
  • also instead of '.' one can use ',' for smaller points and 'o' for bigger points – DDR Nov 27 '18 at 12:59
11

building on Mike N's answer...convert to unix time to scatter properly, then transform your axis labels back from int64s to strings:

type(df.ts1[0])

pandas.tslib.Timestamp

df['t1'] = df.ts1.astype(np.int64)
df['t2'] = df.ts2.astype(np.int64)

fig, ax = plt.subplots(figsize=(10,6))
df.plot(x='t1', y='t2', kind='scatter', ax=ax)
ax.set_xticklabels([datetime.fromtimestamp(ts / 1e9).strftime('%H:%M:%S') for ts in ax.get_xticks()])
ax.set_yticklabels([datetime.fromtimestamp(ts / 1e9).strftime('%H:%M:%S') for ts in ax.get_yticks()])
plt.show()

enter image description here

dvmlls
  • 2,206
  • 2
  • 20
  • 34
5

Not an answer, but I can't edit the question or put this much in a comment, I think.

Here is a reproducible example:

from datetime import datetime
import pandas as pd
df = pd.DataFrame({'x': [datetime.now() for _ in range(10)], 'y': range(10)})
df.plot(x='x', y='y', kind='scatter')

This gives KeyError: 'x'.

Interestingly, you do get a plot with just df.plot(x='x', y='y'); it chooses poorly for the default x range because the times are just nanoseconds apart, which is weird, but that's a separate issue. It seems like if you can make a line graph, you should be able to make a scatterplot too.

There is a pandas github issue about this problem, but it was closed for some reason. I'm going to go comment there and see if we can re-start that conversation.

Is there some clever work-around for this? If so, what?

Aaron Schumacher
  • 3,695
  • 2
  • 23
  • 23
  • 2
    A non-clever work-around is to convert to unix time (int64's), scatter plot, and then fiddle with axis ticks and labels. – jma Apr 21 '15 at 05:39
  • @jma: instead of fiddling, you can try [`matplotlib.dates`](https://matplotlib.org/api/dates_api.html) – serv-inc Oct 21 '18 at 14:21
2

Here's a basic work around to get you started.

import matplotlib, datetime
import matplotlib.pyplot as plt

def scatter_date(df, x, y, datetimeformat):
  if not isinstance(y, list):
      y = [y]
  for yi in y:
      plt.plot_date(df[x].apply(
          lambda z: matplotlib.dates.date2num(
              datetime.datetime.strptime(z, datetimeformat))), df[yi], label=yi)
  plt.legend()
  plt.xlabel(x)

# Example Usage
scatter_date(data, x='date', y=['col1', 'col2'], datetimeformat='%Y-%m-%d')
J Wang
  • 2,075
  • 1
  • 20
  • 26
1

It's not pretty, but as a quick hack you can convert your DateTime to a timestamp using .timestamp() before loading into Pandas and scatters will work just fine (although a completely unusable x-axis).

Mike N
  • 6,395
  • 4
  • 24
  • 21