Questions tagged [feature-scaling]

51 questions
3
votes
1 answer

How to implement PySpark StandardScaler on subset of columns?

I want to use pyspark StandardScaler on 6 out of 10 columns in my dataframe. This will be part of a pipeline. The inputCol parameter seems to expect a vector, which I can pass in after using VectorAssembler on all my features, but this scales all 10…
Insu Q
  • 403
  • 6
  • 13
3
votes
3 answers

Normalize data before removing low variance, makes errors

I'm testing the iris dataset (one can load with the function load_iris() from sklearn.datasets) with the scikit-learn functions normalize and VarianceThreshold. It seems that if I'm using MinMaxScaler and then run VarianceThreshold - there are no…
Boom
  • 1,145
  • 18
  • 44
2
votes
0 answers

Some columns became NaN after scaling

I'm trying to scale features with such a function def featureNormalize(X): ''' This function takes the features as input and returns the normalized values, the mean, as well as the standard deviation for each feature. ''' X_norm = (X -…
Krutch Dd
  • 33
  • 4
2
votes
1 answer

Applying Feature Scaling in a Neural Network

I have two questions: Do I have to apply Feature Scaling over ALL features in Neural Network(and Deep Learning too)? How can I scale categorical features in a dataset for neural network(if needed)?
Andrei
  • 73
  • 1
  • 13
2
votes
1 answer

Data normalization and rescaling value in Python

I have a dataset which contains URLs with publish date (YYYY-MM-DD), visits. I want to calculate benchmark (average) of visits for a complete year. Pages were published on different dates.....e. g. Weightage/contribution of 1st page published in Aug…
2
votes
2 answers

Use same Min and Max Data for Multiple Features in MinMaxScaler

I have a dataset of 5 features. Two of these features are very similar but do not have the same min and max values. ... | feature 2 | feature 3 | ... -------------------------------- ..., 208.429993, 206.619995, ... ..., 207.779999, 205.050003,…
bcsta
  • 1,963
  • 3
  • 22
  • 61
1
vote
1 answer

Understanding the Implications of Scaling Test Data Using the Same Scalar Object as Training Data

I am currently working on a machine learning project and have encountered a dilemma regarding the scaling of test data. I understand that when scaling features, we fit the scalar object using the training data and then transform both the training…
1
vote
1 answer

Strange results when scaling data using scikit learn

I have an input dataset that has 4 time series with 288 values for 80 days. So the actual shape is (80,4,288). I would like to cluster differnt days. I have 80 days and all of them have 4 time series: outside temperature, solar radiation, electrical…
PeterBe
  • 700
  • 1
  • 17
  • 37
1
vote
1 answer

Do features need to be scaled in Logistic Regression?

I have a training set with one feature (credit balance) - numbers varying between 0-20,000. The response is either 0 (Default=No) or 1 (Default=Yes). This was a simulated training set generated using logistic function. For reference it is available…
1
vote
1 answer

Feature Scaling for Time Series Forecasting

I am in the process of conducting a time series analysis, a multivariate time series to be precise and before feeding the inputs to my LSTM model, I have scaled them. The metrics that I am using to evaluate my model are the loss and mean absolute…
Minura Punchihewa
  • 1,498
  • 1
  • 12
  • 35
1
vote
1 answer

Invert feature scaling

In my dataset I have a binary Target (0 or 1) variable, and 8 features: nchar, rtc, Tmean, week_day, hour, ntags, nlinks and nex. week_day is a factor while the others are numeric. I built a decision tree classifier, but my question concerns the…
Mark
  • 1,577
  • 16
  • 43
1
vote
1 answer

mysql feature-scaling calculation

I need to formulate a mysql query to select values normalized this way: normalized = (value-min(values))/(max(values)-min(values)) My attempt looks like this: select Measurement_Values.Time, …
Andrea G
  • 63
  • 5
1
vote
1 answer

Is it right to use different feature scaling techniques to different features?

I read this post about feature scaling: all-about-feature-scaling The two main feature scaling techniques are: min-max scaler - which responds well for features with distributions which are not Gaussian. Standard scaler - which responds well for…
user3668129
  • 4,318
  • 6
  • 45
  • 87
1
vote
0 answers

Why we just use fit() method at train data in scaling problem?

In feature Scaling, we just use fit() method at train data. And not using in valid or test data. Why we dont use mean and sd in test or valid data when we scaling test or valid data?
1
vote
0 answers

Should we normalize / standardize / feature-scale a categorical variable?

variable - 'Item_Fat_Content' values - 'Low Fat', 'Regular', 'High fat', 'No fat' These values on converting into label will take values of 0,1,2,3. On standardising, they will take up numerical values something like 0.0,0.4,0.5,0.9. Will python…
1
2 3 4