I mean, you could always just sneak in the datamaking process using merf
:P. Majority of the data generation is taken from manifoldai merf example:
from merf.utils import MERFDataGenerator
import numpy as np
from mlens.ensemble import SuperLearner
from sklearn.svm import SVR
from sklearn.linear_model import Lasso
from mlens.metrics.metrics import rmse
dgm = MERFDataGenerator(m = .6, sigma_b = np.sqrt(4.5), sigma_e = 1)
num_clusters_each_size = 20
train_sizes = [1, 3, 5, 7, 9]
known_sizes = [9, 27, 45, 63, 81]
new_sizes = [10, 30, 50, 70, 90]
train_cluster_sizes = MERFDataGenerator.create_cluster_sizes_array(train_sizes, num_clusters_each_size)
known_cluster_sizes = MERFDataGenerator.create_cluster_sizes_array(known_sizes, num_clusters_each_size)
new_cluster_sizes = MERFDataGenerator.create_cluster_sizes_array(new_sizes, num_clusters_each_size)
train, test_known, test_new, training_cluster_ids, ptev, prev = dgm.generate_split_samples(train_cluster_sizes, known_cluster_sizes, new_cluster_sizes)
X_train = train[['X_0', 'X_1', 'X_2']]
Z_train = train[['Z']]
clusters_train = train['cluster']
y_train = train['y']
Before making the fit and prediction with some modification from Flennerhag mlens.ensemble superlearner.py
(Github):
ensemble = SuperLearner()
ensemble.add([SVR(), Lasso()])
ensemble.add_meta(SVR())
pred = ensemble.fit(X_train, y_train).predict(X_train)
root = rmse(y_train, pred)
print(root)
>>>
2.345318341087564
But of course, there is always a better method overall if you don't mind specifically using the merf
and ensemble
together.
from keras.models import Sequential
from keras.layers import Dense
from matplotlib import pyplot
from keras import backend
import matplotlib.pyplot as plt
import numpy as np
def rmse(y_true, y_pred):
return backend.sqrt(backend.mean(backend.square(y_pred - y_true), axis=-1))
X = X_train.to_numpy().flatten()
model = Sequential()
model.add(Dense(2, input_dim=1, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam', metrics=[rmse])
history = model.fit(X, X, epochs=500, batch_size=len(X), verbose=2)
plt.plot(history.history['rmse'])
plt.title("keras loss function")
plt.show()
>>>

Do note that the X_train
used here is from the previous merf
code:
X_train = train[['X_0', 'X_1', 'X_2']]