I have a numpy array containing x variables and a y variable that I'd like to use to calculate coefficients on where each coefficient is between 0 and 1 and the sum of all the weights equals 1. How would I go about doing this in Python. I'm using Gekko currently and am only getting weights that are equal to 0 or a single feature with a weight of 1, and based on my knowledge of the data doesn't make sense. My actual data has over 100 features and 5k plus rows.
import numpy as np
from gekko import GEKKO
x = np.array([[15., 21., 13.5, 12., 18., 15.5],
[14.5, 20.5, 16., 14., 19.5, 20.5]])
y = np.array([55.44456011, 55.70023835])
# Number of variables and data points
n_vars = x.shape[1]
n_data = y.shape[0]
# Create a Gekko model
m = GEKKO()
# Set up variables
weights = [m.Var(lb=0, ub=1) for _ in range(n_vars)]
# Set up objective function
y_pred = [m.Intermediate(m.sum([weights[i] * x[j, i] for i in range(n_vars)])) for j in range(n_data)]
objective = m.sum([(y_pred[i] - y[i]) ** 2 for i in range(n_data)])
m.Obj(objective)
# Constraint: sum of weights = 1
m.Equation(sum(weights) == 1)
# Set solver options for faster computation
m.options.SOLVER = 3 # Use IPOPT solver
m.options.IMODE = 3 # Set to optimization steady state mode
# m.options.APPENDEXE = 1 # Enable parallel computing
# Solve the optimization problem
m.solve(disp=False)
# Get the optimized weights
optimized_weights = [w.value[0] for w in weights]