0

After finally working out my data set and being able to graph it, I have been trying to use linear regression to fit the curve. I have tried a few methods but none have given me any results, I think it is due to how my data has been filtered. Here is my code:

from matplotlib import pyplot as plt
import numpy as np
from pandas import DataFrame
from sklearn.linear_model import LinearRegression
from matplotlib.pyplot import figure

figure(num=None, figsize=(100, 100), dpi=100, facecolor='w', edgecolor='k')

plt.rc('font', size=100)          # controls default text sizes
plt.rc('axes', titlesize=100)     # fontsize of the axes title
plt.rc('axes', labelsize=100)    # fontsize of the x and y labels
plt.rc('xtick', labelsize=30)    # fontsize of the tick labels
plt.rc('ytick', labelsize=60)    # fontsize of the tick labels
plt.rc('legend', fontsize=100)    # legend fontsize
plt.rc('figure', titlesize=100)

plt.xticks(rotation=90)


ds = pd.read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
df = DataFrame(ds, columns = ['date', 'location', 'new_deaths', 'total_deaths'])

df = df.replace(np.nan, 0)

US = df.loc[df['location'] == 'United States']


plt.plot_date(US['date'],US['new_deaths'], 'blue', label = 'US', linewidth = 5)
#plt.plot_date(US['date'],US['total_deaths'], 'red', label = 'US', linewidth = 5)

#linear_regressor = LinearRegression()  # create object for the class
#linear_regressor.fit(US['date'], US['new_deaths'])  # perform linear regression
#Y_pred = linear_regressor.predict(X)  # make predictions

#m , b = np.polyfit(x = US['date'], y = US['new_deaths'], deg = 1)






plt.title('New Deaths per Day In US')
plt.xlabel('Time')
plt.ylabel('New Deaths')
plt.legend()
plt.grid()
plt.show()


I know this question has been asking thousands of times, so if there's a post out there that I didn't come across link it to me please. Thank you all! :D

  • When you say `none have given me any results`, what do you mean? I think it is better to share your experiences and the results. Right now, it is not very clear what your question is. – alift May 19 '20 at 01:33
  • I have tried a couple methods found here: 1) https://scipy-lectures.org/packages/scikit-learn/auto_examples/plot_linear_regression.html , 2) https://stackoverflow.com/questions/6148207/linear-regression-with-matplotlib-numpy , 3)https://towardsdatascience.com/linear-regression-in-6-lines-of-python-5e1d0cd05b8d , but when I try to pass my data set into the linear regression functions I always get "TypeError: can only concatenate str (not "float") to str". – Tiago Costa May 19 '20 at 01:36
  • Basically, how would I be able to get a linear regression curve fit for the way I have my data sets filtered? @alift – Tiago Costa May 19 '20 at 01:37
  • You should ask your question about the error you are getting; which is obviously about converting str features to float before fitting the LR. Just Google your error, and the first result is https://stackoverflow.com/questions/52796600/typeerror-can-only-concatenate-str-not-float-to-str ; does this help? – alift May 19 '20 at 01:44
  • Somewhat yes, I have tried converting from str to float but not sure how the formatting would look like because of how I have my data filtered. Agreed though, my questions should have been about how to format the conversion from str to float. – Tiago Costa May 19 '20 at 01:48

1 Answers1

0

With sklearn's LinearRegression, you can do this to fit the regression:

regr = LinearRegression()
regr.fit(US['date'].values.reshape(-1, 1), US['new_deaths'])

To plot it:

# plot the original points
plt.plt(US['date'], US['new_deaths'])

# plot the fitted line. To do so, first generate an input set containing
# only the max and min limits of the x range
trendline_x = np.array([US['date'].min(), US['date'].max()]).reshape(-1, 1)
# predict the y values of these two points
trendline_y = regr.predict(trendline_x)
# plot the trendline
plt.plot(trendline_x, trendline_y)

If you are only after the visual, Seaborn's lmplot is a handy and nice-looking alternative.

xcmkz
  • 677
  • 4
  • 15
  • I'm getting an error from ```regr.fit(US['date'], US['new_deaths'])``` saying "could not convert string to float: '2019-12-31'" @xcmkz – Tiago Costa May 19 '20 at 01:43
  • OK you will need to convert the dates from strings to an actual continuous representation of dates, for example `datetime`. To do so: use `pd.to_datetime(US['date'])` instead of `US['date']`. – xcmkz May 19 '20 at 01:48
  • Also I updated the `regr.fit` line in my answer -- I forgot that the X matrix needs to be of shape (N, k) where k is the number of features. In other words, the input should be of shape (N, 1) instead of (N,). sorry – xcmkz May 19 '20 at 01:51
  • Got it thank you! I am getting the error: "ValueError: could not convert string to float: '2019-12-31'" still, will do some more research. Once my values are converted to floats it looks like your code should work. – Tiago Costa May 19 '20 at 01:55