0

I have a data-frame that looks like this:

Date_1 Date_2 Date_Diff
2017-02-14 2017-03-09 23 days
2019-07-16 2019-09-09 55 days
2014-10-29 2018-04-06 1255 days

where Date_1 & Date_2 are datetime objects and Date_Diff is a timedelta variable representing the difference between the two dates. I want to plot the frequency of my Date_Diff variable (e.g: how often is the gap between date_1 and date_2 = x), so I created a simply time series plot:

df_final['Date_Diff'].plot(label='Original',color='orange')
plt.show()

and I got the following plot:

time-series plot

However, I don't feel like I did it correctly because my y-axis contains negative values? Can someone please explain to me what my plot is saying and/or how I can fix it?

Thanks

wwii
  • 23,232
  • 7
  • 37
  • 77
Mitchell
  • 55
  • 5
  • 3
    do you have negative values in your `Date_Diff` column? – MattR Mar 15 '22 at 14:58
  • 1
    Also your y-axis goes up to `1e17`. Maybe try plotting a subset of your data, for example the three rows you shared above. – Steve Mar 15 '22 at 15:03

1 Answers1

0

I would make a new column (or a separate pandas series if you don't want to add a new column) which is the exact numeric value of what you want to plot:

df = pd.DataFrame(
     {'Date_1': [pd.datetime(2017, 2, 14), pd.datetime(2019, 7, 16), pd.datetime(2014, 10, 29)],
      'Date_2': [pd.datetime(2017, 3, 9), pd.datetime(2019, 9, 9), pd.datetime(2018, 4, 6)]})

df['Date_Diff'] = df['Date_2'] - df['Date_1']

# Numeric value of what we want to plot
df['Days_Diff'] = df['Date_Diff'].apply(lambda x: abs(x.days))

Which gives us

      Date_1     Date_2 Date_Diff  Days_Diff
0 2017-02-14 2017-03-09   23 days         23
1 2019-07-16 2019-09-09   55 days         55
2 2014-10-29 2018-04-06 1255 days       1255

And you can use the plotting command you used before:

df['Days_Diff'].plot()
plt.show()

Note that I included abs in the definition of df['Days_Diff'] in case Date_2 is before Date_1 (which might be the case in your dataset), but you might want to remove that if it highlights potential errors in your dataset.

Edit:

If you want to plot the frequency that certain differences occur, you might want to instead use a histogram, or use an example from one of the answers to this question.

Steve
  • 1,579
  • 10
  • 23