2

Is it possible to do a polynomial regression line on a scatter() in matplotlib?

This is my graph: https://i.stack.imgur.com/3ra9x.jpg

    alg_n = [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4...]
    orig_hc_runtime = [0.01, 0.02, 0.03, 0.04, 0.04, 0.04, 0.05, 0.09...]

    plt.scatter(alg_n, orig_hc_runtime, label="Orig HC", color="b", s=4)
    plt.scatter(alg_n, mod_hc_runtime, label="Mod HC", color="c", s=4)
    ...

    x_values = [x for x in range(5, n_init+2, 2)]
    y_values = [y for y in range(0, 10, 2)]

    plt.xlabel("Number of Queens")
    plt.ylabel("Time (sec)")
    plt.title("Algorithm Performance: Time")
    plt.xticks(x_values)
    plt.yticks(y_values)
    plt.grid(linewidth="1", color="white")
    plt.legend()
    plt.show()

Is it possible to have regression lines for eat data set? If so, can you please explain how I can do it.

Joshua
  • 139
  • 3
  • 12

2 Answers2

2

Not sure if it can be done just using matplotlib but you can always compute regression separately and plot it. I leave an example code using scikit-learn to compute regression line.

import numpy as np
from sklearn.preprocessing import PolynomialFeatures
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

x = [1, 2, 3, 4, 5, 8, 10]
y = [1.1, 3.8, 8.5, 16, 24, 65, 99.2]

model = make_pipeline(PolynomialFeatures(2), LinearRegression())
model.fit(np.array(x).reshape(-1, 1), y)
x_reg = np.arange(11)
y_reg = model.predict(x_reg.reshape(-1, 1))

plt.scatter(x, y)
plt.plot(x_reg, y_reg)
plt.show()

Output :

2nd degree polynomial regression

Seljuk Gulcan
  • 1,826
  • 13
  • 24
2

I would advise you to use the Seaborn library. It is built on top of matplotlib and has many statistical plotting routines. Have a look at the examples for regplot and lmplot: http://seaborn.pydata.org/tutorial/regression.html#functions-to-draw-linear-regression-models

In your case, you could do something like:

import pandas as pd
import seaborn as sns
df = pd.DataFrame.from_dict({"Number of Queens": [1, 1, 1, 2, 2, 2, 3,
                                                  3, 3, 4, 4, 4],
                             "Time (sec)": [0.01, 0.02, 0.03, 0.04, 0.04, 0.04,
                                            0.05, 0.09, 0.12, 0.14, 0.15, 0.16]})
sns.lmplot('Number of Queens', 'Time (sec)', df, order=1)

Resulting plot

If you want regression lines for different groups, add a column with the group labels and add it to the hue parameter of lm_plot.

Rob
  • 3,418
  • 1
  • 19
  • 27