Python - How to differentiate SMOTE resampling from original data

Asked Jun 07 '20 at 13:58

Active Jun 08 '20 at 06:23

Viewed 233 times

I over sampled my data using SMOTE like so:

>>> from imblearn.over_sampling import SMOTE
>>> X_resampled, y_resampled = SMOTE().fit_resample(X, y)

So now X_resampled, y_resampled are larger than the original data set. How can I tell apart the original data from the synthetic samples?

edited Jun 08 '20 at 06:23

rayryeng

asked Jun 07 '20 at 13:58

Shlomi Schwartz

I rolled back your original question title. The other one could be misinterpreted and does not relate to your current question at all. – rayryeng Jun 08 '20 at 06:23
OK, but why not? X,y are numpy arrays and the X_resampled, y_resampled are numpy arrays containing the original X,y. Comparing the differences between them will solve my issue. – Shlomi Schwartz Jun 08 '20 at 07:24
y_resampled_indicator = [ str(y_resampled[index]) if point in X else (str(y_resampled[index]) + '- synthetic') for index, point in enumerate(X_resampled)]. I was looking for the same anwser. I figured it out by using list comprehension using numpy arrays. It works. Maybe (/probably) not the best way to do it so if you have figured it out with a more 'numpy' way of doing it, I would be interested by the answer ;) – Cédric Guilmin Nov 11 '21 at 10:08

0 Answers0