Highest Voted 'oversampling' Questions

35

votes

1 answer

Using Smote with Gridsearchcv in Scikit-learn

I'm dealing with an imbalanced dataset and want to do a grid search to tune my model's parameters using scikit's gridsearchcv. To oversample the data, I want to use SMOTE, and I know I can include that as a stage of a pipeline and pass it to…

asked May 09 '18 at 04:46

Ehsan M

361
1
3
4

16

votes

5 answers

SMOTE initialisation expects n_neighbors <= n_samples, but n_samples < n_neighbors

I have already pre-cleaned the data, and below shows the format of the top 4 rows: [IN] df.head() [OUT] Year cleaned 0 1909 acquaint hous receiv follow letter clerk crown... 1 1909 ask secretari state war…

scikit-learn knn tf-idf oversampling imblearn

asked Mar 20 '18 at 23:48

Dbercules

629
1
9
26

11

votes

1 answer

Duplicating training examples to handle class imbalance in a pandas data frame

I have a DataFrame in pandas that contain training examples, for example: feature1 feature2 class 0 0.548814 0.791725 1 1 0.715189 0.528895 0 2 0.602763 0.568045 0 3 0.544883 0.925597 0 4 0.423655 0.071036 0 5…

python pandas machine-learning oversampling

asked Jan 22 '18 at 00:10

Franck Dernoncourt

77,520
72
342
501

10

votes

3 answers

Use SMOTE to oversample image data

I'm doing a binary classification with CNNs and the data is imbalanced where the positive medical image : negative medical image = 0.4 : 0.6. So I want to use SMOTE to oversample the positive medical image data before training. However, the…

image-processing machine-learning scikit-learn deep-learning oversampling

asked Dec 07 '18 at 09:35

Salmon

369
1
4
14

9

votes

2 answers

Weighted random sampler - oversample or undersample?

Problem I am training a deep learning model in PyTorch for binary classification, and I have a dataset containing unbalanced class proportions. My minority class makes up about 10% of the given observations. To avoid the model learning to just…

pytorch oversampling pytorch-dataloader

asked Jun 02 '21 at 05:10

clueless

211
2
3
7

6

votes

3 answers

using sklearn.train_test_split for Imbalanced data

I have a very imbalanced dataset. I used sklearn.train_test_split function to extract the train dataset. Now I want to oversample the train dataset, so I used to count number of type1(my data set has 2 categories and types(type1 and tupe2) but…

python-3.x scikit-learn training-data imbalanced-data oversampling

asked May 19 '20 at 07:16

Maryam

119
1
1
6

6

votes

2 answers

Oversampling or SMOTE in Pyspark

I have 7 classes and the total number of records are 115 and I wanted to run Random Forest model over this data. But as the data is not enough to get a high accuracy. So i wanted to apply oversampling over all the classes in a way that the majority…

machine-learning pyspark random-forest oversampling

asked Dec 26 '18 at 20:31

Surbhi Jain

107
1
2
5

6

votes

1 answer

How to apply SMOTE technique (oversampling) before word embedding layer

How to apply SMOTE algorithm before word embedding layer in LSTM. I have a problem of text binary classification (Good(9500) or Bad(500) review with total of 10000 training sample and it's unbalanced training sample), mean while i am using LSTM with…

python-3.x tensorflow deep-learning oversampling

asked Nov 19 '18 at 23:41

user1531248

521
1
5
17

5

votes

1 answer

SMOTE function not working in make_pipeline

I wanna simultaneously apply cross-validation and over-sampling. I get the following error from this code: from sklearn.pipeline import Pipeline, make_pipeline imba_pipeline = make_pipeline(SMOTE(random_state=42), …

python scikit-learn cross-validation oversampling smote

asked Nov 12 '19 at 19:00

Vahid the Great

393
5
18

5

votes

1 answer

Upsampling: insert extra values between each consecutive elements of a vector

Suppose we a have a vector V consisting of 20 floating point numbers. Is it possible to insert values between each pair of these floating points such that vector V becomes a vector of exactly 50 numbers. The inserted value should be a random number…

c++ oversampling

asked Jul 31 '19 at 09:14

student_11

142
6

5

votes

1 answer

How to resample text (imbalanced groups) in a pipeline?

I'm trying to do some text classification using MultinomialNB, but I'm running into problems because my data is unbalanced. (Below is some sample data for simplicity. In actuality, mine is much larger.) I'm trying to resample my data using…

python pipeline text-classification resampling oversampling

asked Jan 09 '19 at 20:45

Kelsey

401
9
21

4

votes

1 answer

TypeError: init() got an unexpected keyword argument 'ratio' when using SMOTE

I am using SMOTE to oversample as my dataset is imbalanced. I am getting an unexpected argument error. But in the documentation, the ratio argument is defined for SMOTE. Can someone help me understand where I am going wrong? Code snippet from…

oversampling imblearn smote

asked Jun 06 '20 at 00:11

anushiya-thevapalan

561
3
5
16

3

votes

2 answers

Over and under sample multi-class training examples (rows) in a pandas dataframe to specified values

I would like to make a multi-class pandas dataframe more balanced for training. A simplified version of my training set looks as follows: Imbalanced dataframe: counts for class 0, 1 and 2 are respectively 7, 3 and 1 animal class 0 dog1 …

python pandas dataframe oversampling

asked Jul 16 '21 at 17:31

Simon

33
3

3

votes

3 answers

Imbalanced Image Dataset (Tensorflow2)

I'm trying to do a binary image classification problem, but the two classes (~590 and ~5900 instances, for class 1 and 2, respectively) are heavily skewed, but still quite distinct. Is there any way I can fix this, I want to try SMOTE/random…

tensorflow keras imbalanced-data image-classification oversampling

asked Feb 02 '21 at 20:03

Sakib Ahamed

31
1
3

3

votes

2 answers

Oversampling a sparse dataset in Python

I have a dataset that has a multi-labeled data. There is a total of 20 labels (from 0 to 20) which has an imbalance distribution among them. Here is an overview of the data: |id |label|value | |-----|-----|------------| |95534|0 …

python pandas dataframe oversampling smote

asked Sep 11 '20 at 17:27

LoneWolf

79
6

Questions tagged [oversampling]