1

I am trying to port over a basic binary sklearn logistic regression model to tensorflow, and I would like to use the Estimator class, because I just need a simple model. However, because my two classes are imbalanced, the sklearn version used the class_weight parameter. I do not see an equivalent variant for the Estimator.

I'm trying to get the same functionality as:

class_weight = {"false": 1, "true": 10}
model = sklearn.linear_model.LogisticRegression(class_weight = class_weight)
model.fit(X, Y)

I tried using the weight_column, but it did not seem to properly train. It has a lot more false positives, due to the class imbalance not being properly accounted for. I also tried looking into changing the loss function, but every example I see is not for the Estimator framework: Weighted Training Examples in Tensorflow

def input_fn(X, Y):
  features = dict(X)
  features['weight'] = np.ones(len(X))
  dataset = tf.data.Dataset.from_tensor_slices((features, Y))
  return dataset.repeat(1).batch(len(Y)) # 1 epoch, no batching

model = tf.estimator.LinearClassifier(feature_columns, weight_column='weight')
model.train(input_fn: lambda: batching_fn(X, Y)

But doing this gives accuracy around 6% lower, which is fairly significant. And most of it comes from false positives, so the model isn't properly oversampling the negative/False cases.

sanyassh
  • 8,100
  • 13
  • 36
  • 70

0 Answers0