I'm banging my head trying to figure out what I'm doing wrong here:
df.read_csv('data.csv')
# Determine slope and intercept
area = np.array(df['area'])
rooms = np.array(df['rooms'])
balcony = np.array(df['balcony'])
age = np.array(df['age'])
price = np.array(df['price'])
features = np.array(df[['area', 'rooms', 'balcony', 'age']])
rows, cols = features.shape
limit = 1000
learn = 0.00001
slope = np.zeros((cols,))
intercept = 0
history = []
for i in range(limit):
residual = (np.dot(features, slope) + intercept) - price
derivative_of_slope = np.zeros((cols,))
derivative_of_intercept = residual.mean()
for j in range(cols):
derivative_of_slope[j] = np.dot(features.take(j, axis=1), residual) # I think the issue is here and I'm overlooking something
derivative_of_slope /= rows
history.append({'cost': derivative_of_intercept, 'intercept': intercept, 'slope': slope})
slope = slope - learn * derivative_of_slope
intercept = intercept - learn * derivative_of_intercept
history = pd.DataFrame(history, columns=['cost', 'intercept', 'slope'])
history[-1:]
Here is the sample output of the features dataset:
The issue I'm having is that for some reason the 2nd, 3rd, and 4th slope parameters don't converge, I played around a bit with the learning rate and the number of iterations and got somewhat close a few times but not really. The closest slopes I had still have me >10k higher prediction.
Example of my determined slope & intercept:
And the slope as determined by Sklearn:
Code used to generate the dataset, generates a data.csv file:
n = 1000
avg_price_per_m2 = 1500
with open('data.csv', 'w') as dataset:
writer = csv.DictWriter(dataset, ['area', 'rooms', 'balcony', 'age', 'price'])
writer.writeheader()
for i in range(n):
area_in_m2 = round(random.uniform(25.00, 90.00), 2)
number_of_rooms = random.randint(1, 8)
has_balcony = random.randint(0, 1)
age = random.randint(0, 70)
# Base price
price_per_m2 = avg_price_per_m2 + random.randint(50, 350)
# Increase base price between 100, 300 for each room
price_per_m2 += random.randint(100, 300) * number_of_rooms
# Increase price by 3-8% if the house has a balcony
price_per_m2 *= random.uniform(1.03, 1.08)
# Decrease price by 0.5% for each year
price_per_m2 -= age * 0.05
price = round(area_in_m2 * price_per_m2, 2)
row = {'area': area_in_m2, 'rooms': number_of_rooms, 'balcony': has_balcony, 'age': age, 'price': price}
writer.writerow(row)