0

I have a data frame and would like to make a scatter plot of how long it took for a request to be completed days on the y-axis and the day the request was filed (Received, which is a datetime object) on the x-axis.

Someone values of 'Received' have two entries because sometimes two requests were filed on the same day.

Here are some of my data and the code I have tried:

Received          Days
2012-08-01        41.0 
2014-12-31       692.0
2015-02-25       621.0
2015-10-15       111.0

sns.regplot(x=simple_denied["Received"], y=simple_denied["days"], marker="+", fit_reg=False)


plt.plot('Received','days', simple_denied, color='black')
Graham Streich
  • 874
  • 3
  • 15
  • 31
  • I think you may wanna use barplot, line plot or heatmap instead of scatterplot since it would require two continues variable. If there's dups in Received, try to aggregate the Days together first before plotting like taking the means or something. – steven Feb 15 '19 at 03:19
  • https://jakevdp.github.io/PythonDataScienceHandbook/04.14-visualization-with-seaborn.html#Bar-plots – steven Feb 15 '19 at 03:23
  • https://www.youtube.com/watch?v=jV24N7SPXEU – steven Feb 15 '19 at 03:30
  • I would like to use a scatter plot to avoid having to aggregate the data. The variables have the same x-axis variable but different y-axis variables. – Graham Streich Feb 15 '19 at 03:46
  • I don't want a line graph by grouping. And I think making barplots would compliment the scatter plots by grouping by month but that is a seperate question. – Graham Streich Feb 15 '19 at 03:47

2 Answers2

0

Let's start by setting up your data. I actually added another date '2014-12-31' to your example dataset, so that we can verify that our plotting routine works when we have multiple requests received on the same day:

import matplotlib.pyplot as plt
plt.style.use('seaborn')
import pandas as pd
import numpy as np

dates = np.array(['2012-08-01', '2014-12-31',
                  '2014-12-31', '2015-02-25',
                  '2015-10-15'], dtype='datetime64')

days = np.array([41, 692, 50, 621, 111])

df = pd.DataFrame({'Received' : dates, 'Days' : days})

The dataframe created should hopefully approximate what you have. Producing the scatter plot you desire is now straight forward:

fig, ax = plt.subplots(1, 1)

ax.scatter(df['Received'], df['Days'], marker='+')
ax.set_xlabel("Receieved")
ax.set_ylabel("Days")

This gave me the following plot:

enter image description here As noted by @ImportanceOfBeingErnest in the comments below, you need a recent version of pandas for this routine to work.

jwalton
  • 5,286
  • 1
  • 18
  • 36
0

You hit two cases which don't work. sns.regplot would not work with dates. And plt.plot will need to have the data specified (it cannot know which data to use just by the name of the columns).

So any of the following would provide you a scatter plot of the data

  • sns.scatterplot(x="Received", y="days", data=simple_denied, marker="+")
  • sns.scatterplot(x=simple_denied["Received"], y=simple_denied["days"], marker="+")

  • plt.scatter(simple_denied["Received"].values, simple_denied["days"].values, marker="+")

  • plt.plot(simple_denied["Received"].values, simple_denied["days"].values, marker="+", ls="")

  • plt.plot("Received", "days", data=simple_denied, marker="+", ls="")

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Thank you, the `plt.scatter(simple_denied["Received"].values, simple_denied["days"].values, marker="+")` works. The sns plots both give me the error `AttributeError: module 'seaborn' has no attribute 'scatterplot'` even though I pip installed the latest seaborn package. The other two plt.plot() create blank graphs. Please let me know if there is something I am missing. – Graham Streich Feb 15 '19 at 17:20
  • Concerning seaborn, yes, `scatterplot` is quite new. So probably your update was unsuccessful. For the `plot` commands, maybe also your matplotlib version is too old? – ImportanceOfBeingErnest Feb 15 '19 at 17:23
  • Do the numbers on the axes correspond to the range of values you would expect from the data? Can you try to reduce the dataset to see if that makes a difference (e.g. using `df.head()` instead of `df`)? Can you try to use a different `marker`? Also make sure to actually use the versions you think you use by printing within the code `print(.__version__)` and compare to what you expect. – ImportanceOfBeingErnest Feb 15 '19 at 23:45