6

I've tried everything and looked in other threads here but I can't find how to smoothen a line in a matplotlib chart. The thing is that on most tutorials, both axis have numeric values, while in my case, for my x axis I have a date value...

Is this possible? If not, is there any other visualization library that could allow me to do this?

Here is my code:

date = ["Jan", "Feb", "Mar", "Apr", "May"]
value = [4,12,15,7,25]
plt.plot(date,value)

plt.show() 

Which is currently outputting this:

enter image description here

I want to show it like this:

enter image description here

Thanks a lot!

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • 2
    Does this answer your question? [Plot smooth line with PyPlot](https://stackoverflow.com/questions/5283649/plot-smooth-line-with-pyplot) – Tom Jun 22 '21 at 16:30
  • No :( maybe i'm a little bit stupid, but that's one of the threads i've tried. Since my X values are qualitative (dates), the code is giving me this error: ValueError: could not convert string to float: 'Jan' – Nicolás Múnera Jun 22 '21 at 16:40
  • 2
    You could perhaps convert the Months to Numbers first? Also this question does not seem to about pandas so I have removed the tag. If there is a pandas aspect to this question please [edit](https://stackoverflow.com/posts/68087386/edit) your question to reflect this and add the tag back. – Henry Ecker Jun 22 '21 at 16:43
  • Maybe run some code on the data that creates points in between each data point and therefore smooths it out. – SamTheProgrammer Jun 22 '21 at 16:46
  • How can I convert them to numbers? I need the x axis labels to remain with text :S – Nicolás Múnera Jun 22 '21 at 16:47
  • 2
    You can use just the numbers `0, 1, 2, ...` for the spline function. E.g. `date_num=np.arange(len(date))`. And after drawing the curve call `plt.xticks(date_num, date)`` – JohanC Jun 22 '21 at 16:50

1 Answers1

8

I retracted my close vote because I missed the issue that you are plotting against strings on the x-axis (and thus it is more difficult to interpolate between them). As others have suggested, the key then is to use your date-strings to source numbers for plotting and interpolating. Once you have done so, this answer is still a good framework to follow.

import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import make_interp_spline

# original data
date = ["Jan", "Feb", "Mar", "Apr", "May"]
value = [4,12,15,7,25]

# create integers from strings
idx = range(len(date))
xnew = np.linspace(min(idx), max(idx), 300)

# interpolation
spl = make_interp_spline(idx, value, k=3)
smooth = spl(xnew)

# plotting, and tick replacement
plt.plot(xnew, smooth)
plt.xticks(idx, date)

enter image description here

idx is the values (0, 1, 2, 3, 4), and it is used for plotting and interpolation. At the end, the call to xticks is used to use the date strings to label those tick positions.

The above is mainly based on the comments (from HenryEcker and JohanC). The new piece I wanted to add is that another way of doing the interpolation is to convert your strings to actual date-times:

import matplotlib.dates as mdates # for formatting
import matplotlib.pyplot as plt
from scipy.interpolate import make_interp_spline
import pandas as pd # for working with dates

# instead of ["Jan", "Feb", "Mar", "Apr", "May"], create datetime objects
date = pd.date_range('01-01-2020', freq='MS', periods=5)
# DatetimeIndex(['2020-01-01', '2020-02-01', '2020-03-01', '2020-04-01', '2020-05-01'], dtype='datetime64[ns]', freq='MS')

value = [4,12,15,7,25]

# now make new x positions using a date range, instead of linspace 
# see here: https://stackoverflow.com/a/50728640/13386979
xnew = pd.date_range(date.min(), date.max(), periods=300)

# interpolation
spl = make_interp_spline(date, value, k=3)
smooth = spl(xnew)

# plotting
plt.plot(xnew, smooth)

# using mdates to get the x-axis formatted correctly
months = mdates.MonthLocator()
fmt = mdates.DateFormatter('%b') # %b -> Month as locale’s abbreviated name
ax = plt.gca()
ax.xaxis.set_major_locator(months)
ax.xaxis.set_major_formatter(fmt)

enter image description here

This latter approach involves a little extra formatting work (and imports), but it is a little more explicit about plotting temporal data. I find this can be more intuitive to work with. For example, if you have multiple time series you can easily plot them side-by-side; you can refer to specific dates more easily in the code; you don't have to remember what indices refer to which dates (e.g. March and 2 in this example), etc...

Tom
  • 8,310
  • 2
  • 16
  • 36