I am trying to port over a basic binary sklearn logistic regression model to tensorflow, and I would like to use the Estimator class, because I just need a simple model. However, because my two classes are imbalanced, the sklearn version used the class_weight parameter. I do not see an equivalent variant for the Estimator.
I'm trying to get the same functionality as:
class_weight = {"false": 1, "true": 10}
model = sklearn.linear_model.LogisticRegression(class_weight = class_weight)
model.fit(X, Y)
I tried using the weight_column, but it did not seem to properly train. It has a lot more false positives, due to the class imbalance not being properly accounted for. I also tried looking into changing the loss function, but every example I see is not for the Estimator framework: Weighted Training Examples in Tensorflow
def input_fn(X, Y):
features = dict(X)
features['weight'] = np.ones(len(X))
dataset = tf.data.Dataset.from_tensor_slices((features, Y))
return dataset.repeat(1).batch(len(Y)) # 1 epoch, no batching
model = tf.estimator.LinearClassifier(feature_columns, weight_column='weight')
model.train(input_fn: lambda: batching_fn(X, Y)
But doing this gives accuracy around 6% lower, which is fairly significant. And most of it comes from false positives, so the model isn't properly oversampling the negative/False cases.