1

How can I make the horizontal lines below follow through the data points?

I have some Pandas DataFrame objects which contain data on actual and predicted values over time (the image below shows temperature actual (blue) vs predicted (orange) using seaborn). I would like a line of best fit to pass through both the actual and predicted values to easily compare whether the predicted values follow the trend of the actual ones. The example shown below is fairly clear to see that it does, but not all of the data sets are as straight forward.

So far using seaborn, there seems to be very limited support for such non-linear lines of best fit. I used lmplot
sb.lmplot(x='Epoch', y=HOURLYDRYBULBTEMPC', hue='Test', data=data, size=20)
to draw the graph below, which draws linear regression lines straight through the middle. There is an order parameter which uses numpy.polyfit, but it isn't capable of fitting to such a pattern of data.

Is there something I can use that will do this?
If not, I could write some code which splits the x axis into n number of buckets, take the average of each bucket, and plot them on the graph drawing a line between them; but then if I calculate those points, I still haven't come across a way to plot/overlay that on the graph either.

Most of the other Q&As on this talk about either linear regression lines, or fitting a specific curve type to the data for the fit, but that would be too time consuming with the amount of different datasets this needs to be applied to, and not all of them will follow a nice curve that is modeled easily.

Any help is appreciated, thanks.

Temperature

parrowdice
  • 1,902
  • 15
  • 24
  • Is there any reason why the fitted data can't be stored in the same dataframe as the original, and plotted later? This seems easier than relying on a plotting package like seaborn to make a complex fit, which will occasially fail. For seasonal data, I'd try to fit with a sine function, using standard scipy.optimize.leastsq methods. – Mark Teese Mar 14 '18 at 12:42
  • @MarkTeese When you say 'plot it later', that is an option, as long as it's overlaid on the same graph, which I haven't found how to do yet. And choosing specific functions to model the curve isn't feasible for this situation as the datasets are many different shapes/types; I'm looking for a one size fits all approach (such as a moving average) – parrowdice Mar 14 '18 at 12:59
  • It seems you are asking for a program that, given a dataset, identifies the best model / fit function and performs the fit to the data. This is hardly possible, because to find a good model automatically, at least some underlying knowledge about the data needs to be present. What would be possible is that you select a subset of possible models, perform a fit for each and than select the one which has the least residuals. But as it stands, this question is just too broad. – ImportanceOfBeingErnest Mar 14 '18 at 13:07
  • 1
    to overlay plots on the same curve, create the ax first in matplotlib (e.g. `fig, ax = plt.sublots()`, and then plot specifically to that ax `sb.lmplot(y=..,x=..,data=..,ax=ax)`. with this method, you can plot first the data, and then the fitted curve, as you like. This works for whole dataframes, too, e.g. `df.plot(ax=ax)`. Since seaborn is built on top of matplotlib, it's kinda hard to navigate until you have mastered the matplotlib basics. – Mark Teese Mar 14 '18 at 13:14
  • Oh, so are you asking how to plot a moving average? That is surely already asked and answered somewhere on this page, no need to ask a new question here. – ImportanceOfBeingErnest Mar 14 '18 at 13:36

0 Answers0