Adding a trendline to time series plot

Question

I have this plot

Now I want to add a trend line to it, how do I do that?

The data looks like this:

I wanted to just plot how the median listing price in California has gone up over the years so I did this:

# Get California data
state_ca = []
state_median_price = []
state_ca_month = []
for state, price, date in zip(data['ZipName'], data['Median Listing Price'], data['Month']):
    if ", CA" not in state:
        continue
    else:
        state_ca.append(state)
        state_median_price.append(price)
        state_ca_month.append(date)

Then I converted the string state_ca_month to datetime:

# Convert state_ca_month to datetime
state_ca_month = [datetime.strptime(x, '%m/%d/%Y %H:%M') for x in state_ca_month]

Then plotted it

# Plot trends
figure(num=None, figsize=(12, 6), dpi=80, facecolor='w', edgecolor='k')
plt.plot(state_ca_month, state_median_price)
plt.show()

I thought of adding a trendline or some type of line but I am new to visualization. If anyone has any other suggestions I would appreciate it.

Following the advice in the comments I get this scatter plot

I am wondering if I should further format the data to make a clearer plot to examine.

Why are you using bars instead of a scatter plot for data that you want a trendline for? — Reedinationer, Mar 22 '19 at 17:09
Even if you would have any trendline command - what kind of trend would you expect derived from these data? Do you think this characteristic looks as if one of its most important properties was a meaningful trend? — SpghttCd, Mar 22 '19 at 17:11
@Reedinationer if you look closer: that's not a bar plot - it's a normal line plot... — SpghttCd, Mar 22 '19 at 17:12
@SpghttCd yes that's the term I was looking for. I'm sure everybody could understand my intent with the previous question though... — Reedinationer, Mar 22 '19 at 17:13
@SpghttCd Okay perhaps just a line fitted to the top of the bars? — Wolfy, Mar 22 '19 at 17:14
Wolfy try replacing `plt.plot(state_ca_month, state_median_price)` with `plt.scatter(state_ca_month, state_median_price)` so it only shows points and doesn't draw lines between them. I think this will give a much more clear view of your data to start from — Reedinationer, Mar 22 '19 at 17:46
It appears you are plotting (I assume house) prices for a whole range of zip codes over time. I've found moving averages are a good way to see trends over time but they'd only make sense for each zip code. i.e. one trend line per zip code. There is a moving avenue answer [here](https://stackoverflow.com/questions/14313510/how-to-calculate-moving-average-using-numpy). You can generate a trend line for one data series. To get **one** line you need **one** average data series from all the data series. — Tls Chris, Mar 22 '19 at 21:24

Him · Accepted Answer · 2019-03-22T19:25:32.063

If by "trend line" you mean a literal line, then you probably want to fit a linear regression to your data. sklearn provides this functionality in python.

From the example hyperlinked above:

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

# Load the diabetes dataset
diabetes = datasets.load_diabetes()


# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]

# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print("Mean squared error: %.2f"
      % mean_squared_error(diabetes_y_test, diabetes_y_pred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(diabetes_y_test, diabetes_y_pred))

# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

To clarify, "the overall trend" is not a well-defined thing. Many times, by "trend", people mean a literal line that "fits" the data well. By "fits the data", in turn, we mean "predicts the data." Thus, the most common way to get a trend line is to pick a line that best predicts the data that you have observed. As it turns out, we even need to be clear about what we mean by "predicts". One way to do this (and a very common one) is by defining "best predicts" in such a way as to minimize the sum of the squares of all of the errors between the "trend line" and the observed data. This is called ordinary least squares linear regression, and is one of the simplest ways to obtain a "trend line". This is the algorithm implemented in sklearn.linear_model.LinearRegression.

I am not sure how this helps, I am not trying to predict anything. I just wanted to visualize the data to show an overall trend or whatever. — Wolfy, Mar 22 '19 at 18:12
Yes, correct so far. I just don't understand in these times where _ml_, _ai_, _dl_...... are not only in everbodys speech, but additionally are sold as the versatile solution for everything... Well however, I don't understand why on earth nowadays even for a _simple linear regression_ - a _machine learning lib_ is loaded and propagated?!! Numpy does this for years fine, scipy does it of course anyway - i don't want to offend anyone, for sure, but this is my first and a suiting moment to ask this crazy little thing called internet: WTF...? — SpghttCd, Mar 22 '19 at 19:08

Adding a trendline to time series plot

1 Answers1