matplotlib/seaborn scatter plot with datetime object on x-axis and days with multiple entries

Question

I have a data frame and would like to make a scatter plot of how long it took for a request to be completed days on the y-axis and the day the request was filed (Received, which is a datetime object) on the x-axis.

Someone values of 'Received' have two entries because sometimes two requests were filed on the same day.

Here are some of my data and the code I have tried:

Received          Days
2012-08-01        41.0 
2014-12-31       692.0
2015-02-25       621.0
2015-10-15       111.0

sns.regplot(x=simple_denied["Received"], y=simple_denied["days"], marker="+", fit_reg=False)


plt.plot('Received','days', simple_denied, color='black')

I think you may wanna use barplot, line plot or heatmap instead of scatterplot since it would require two continues variable. If there's dups in Received, try to aggregate the Days together first before plotting like taking the means or something. — steven, Feb 15 '19 at 03:19
https://jakevdp.github.io/PythonDataScienceHandbook/04.14-visualization-with-seaborn.html#Bar-plots — steven, Feb 15 '19 at 03:23
I would like to use a scatter plot to avoid having to aggregate the data. The variables have the same x-axis variable but different y-axis variables. — Graham Streich, Feb 15 '19 at 03:46
I don't want a line graph by grouping. And I think making barplots would compliment the scatter plots by grouping by month but that is a seperate question. — Graham Streich, Feb 15 '19 at 03:47

jwalton · Answer 1 · 2019-02-15T15:35:21.240

0

Let's start by setting up your data. I actually added another date '2014-12-31' to your example dataset, so that we can verify that our plotting routine works when we have multiple requests received on the same day:

import matplotlib.pyplot as plt
plt.style.use('seaborn')
import pandas as pd
import numpy as np

dates = np.array(['2012-08-01', '2014-12-31',
                  '2014-12-31', '2015-02-25',
                  '2015-10-15'], dtype='datetime64')

days = np.array([41, 692, 50, 621, 111])

df = pd.DataFrame({'Received' : dates, 'Days' : days})

The dataframe created should hopefully approximate what you have. Producing the scatter plot you desire is now straight forward:

fig, ax = plt.subplots(1, 1)

ax.scatter(df['Received'], df['Days'], marker='+')
ax.set_xlabel("Receieved")
ax.set_ylabel("Days")

This gave me the following plot:

As noted by @ImportanceOfBeingErnest in the comments below, you need a recent version of pandas for this routine to work.

edited Feb 15 '19 at 15:35

answered Feb 15 '19 at 10:22

jwalton

5,286
1
18
36

Interesting. In which versions of numpy and matplotlib does this work? – ImportanceOfBeingErnest Feb 15 '19 at 11:45
I tested with (matplotlib 3.0.2, numpy 0.15.4), (2.2.3, 0.15.2), (2.0.2, 0.14.5) and it fails with an `TypeError: invalid type promotion` error. – ImportanceOfBeingErnest Feb 15 '19 at 11:53
I'm running matplotlib 3.0.2 and numpy 1.16.1. I'm also running pandas 0.24.1. I think this is to do with how pandas converts dates between pandas and matplotlib. – jwalton Feb 15 '19 at 12:00
Your numpy versions seem way off. Did you mean you tested on numpy 1.15.4, 1.15.2,... rather than 0.15.4, 0.15.2,... ? – jwalton Feb 15 '19 at 12:08
Yes, replace each zero by one. This is great, hopefully 1.16.1 will be stable enough to be added to conda default channel soon. – ImportanceOfBeingErnest Feb 15 '19 at 12:12
Can you test if that would make `ax.scatter('Received', 'Days', data=df, marker='+')` work as well? – ImportanceOfBeingErnest Feb 15 '19 at 12:26
1.16.1 is listed as Production/Stable on PyPi, so hopefully conda shouldn't be far behind. And yes, ```ax.scatter('Received', 'Days', data=df, marker='+')``` works too! – jwalton Feb 15 '19 at 12:35
No I think you're right, it's rather the pandas version that matters. It works with pandas 0.24, but not with pandas 0.23. There should also be a `FutureWarning` raised, correct? – ImportanceOfBeingErnest Feb 15 '19 at 13:51
To get rid of that warning one would need to deregister the converters. `pd.plotting.deregister_matplotlib_converters()`. – ImportanceOfBeingErnest Feb 15 '19 at 13:59
I may have gotten lost in these comments but I cannot get any of this code to work. – Graham Streich Feb 15 '19 at 17:23
For this code to work you need matplotlib 3.0.2, numpy 1.15 or higher, and pandas 0.24.1. This will work, but might throw a warning. To get rid of that warning, you can call `pd.plotting.deregister_matplotlib_converters()`. – ImportanceOfBeingErnest Feb 15 '19 at 17:25

score 0 · Answer 2 · answered Feb 15 '19 at 11:48

0

You hit two cases which don't work. sns.regplot would not work with dates. And plt.plot will need to have the data specified (it cannot know which data to use just by the name of the columns).

So any of the following would provide you a scatter plot of the data

sns.scatterplot(x="Received", y="days", data=simple_denied, marker="+")
sns.scatterplot(x=simple_denied["Received"], y=simple_denied["days"], marker="+")
plt.scatter(simple_denied["Received"].values, simple_denied["days"].values, marker="+")
plt.plot(simple_denied["Received"].values, simple_denied["days"].values, marker="+", ls="")
plt.plot("Received", "days", data=simple_denied, marker="+", ls="")

answered Feb 15 '19 at 11:48

ImportanceOfBeingErnest

321,279
53
665
712

Thank you, the `plt.scatter(simple_denied["Received"].values, simple_denied["days"].values, marker="+")` works. The sns plots both give me the error `AttributeError: module 'seaborn' has no attribute 'scatterplot'` even though I pip installed the latest seaborn package. The other two plt.plot() create blank graphs. Please let me know if there is something I am missing. – Graham Streich Feb 15 '19 at 17:20
Concerning seaborn, yes, `scatterplot` is quite new. So probably your update was unsuccessful. For the `plot` commands, maybe also your matplotlib version is too old? – ImportanceOfBeingErnest Feb 15 '19 at 17:23
Do the numbers on the axes correspond to the range of values you would expect from the data? Can you try to reduce the dataset to see if that makes a difference (e.g. using `df.head()` instead of `df`)? Can you try to use a different `marker`? Also make sure to actually use the versions you think you use by printing within the code `print(.__version__)` and compare to what you expect. – ImportanceOfBeingErnest Feb 15 '19 at 23:45

matplotlib/seaborn scatter plot with datetime object on x-axis and days with multiple entries

2 Answers2

Linked