3

I over sampled my data using SMOTE like so:

>>> from imblearn.over_sampling import SMOTE
>>> X_resampled, y_resampled = SMOTE().fit_resample(X, y)

So now X_resampled, y_resampled are larger than the original data set. How can I tell apart the original data from the synthetic samples?

rayryeng
  • 102,964
  • 22
  • 184
  • 193
Shlomi Schwartz
  • 8,693
  • 29
  • 109
  • 186
  • I rolled back your original question title. The other one could be misinterpreted and does not relate to your current question at all. – rayryeng Jun 08 '20 at 06:23
  • OK, but why not? X,y are numpy arrays and the X_resampled, y_resampled are numpy arrays containing the original X,y. Comparing the differences between them will solve my issue. – Shlomi Schwartz Jun 08 '20 at 07:24
  • y_resampled_indicator = [ str(y_resampled[index]) if point in X else (str(y_resampled[index]) + '- synthetic') for index, point in enumerate(X_resampled)]. I was looking for the same anwser. I figured it out by using list comprehension using numpy arrays. It works. Maybe (/probably) not the best way to do it so if you have figured it out with a more 'numpy' way of doing it, I would be interested by the answer ;) – Cédric Guilmin Nov 11 '21 at 10:08

0 Answers0