Standardization, or normalization, is a process used to make a vector of real number values have a mean of zero and a standard deviation of one. Also called standard scores or z-scores.
Questions tagged [standardization]
72 questions
104
votes
9 answers
Have there ever been silent behavior changes in C++ with new standard versions?
(I'm looking for an example or two to prove the point, not a list.)
Has it ever been the case that a change in the C++ standard (e.g. from 98 to 11, 11 to 14 etc.) changed the behavior of existing, well-formed, defined-behavior user code - silently?…

einpoklum
- 118,144
- 57
- 340
- 684
3
votes
1 answer
Standardization or scaling of categorical variables
I am fairly new to data science. I am working on use-case of predicting sales demand using linear regression based on product no and store no as predictor. There can be many stores and products with numeric values. Do I need to standardize or scales…

avani jain
- 99
- 1
- 8
3
votes
1 answer
How to implement PySpark StandardScaler on subset of columns?
I want to use pyspark StandardScaler on 6 out of 10 columns in my dataframe. This will be part of a pipeline.
The inputCol parameter seems to expect a vector, which I can pass in after using VectorAssembler on all my features, but this scales all 10…

Insu Q
- 403
- 6
- 13
3
votes
1 answer
Python: 'StandardScaler' object has no attribute '_validate_data'
I recently updated my sklearn. However, since the upgrade I'm getting the error "'StandardScaler' object has no attribute '_validate_data'". The following is a snippet of the code:
Xs = pd.DataFrame([[10,20], [20,30], [30,40], [40,50]])
scalerx =…

Kalpit Narvekar
- 31
- 1
- 3
3
votes
4 answers
Why don't the authors of the C99 standard specify a standard for the size of floating point types?
I noticed on Windows and Linux x86, float is a 4-byte type, double is 8, but long double is 12 and 16 on x86 and x86_64 respectively. C99 is supposed to be breaking such barriers with the specific integral sizes.
The initial technological limitation…

j riv
- 3,593
- 6
- 39
- 54
2
votes
1 answer
Rarefy my species data based on individuals
I am new to R so I apologize in advance. I sampled moths along an elevational gradient with a total of 8 different sites. I had unequal sampling nights per elevation. Because of my unequal sampling nights, I want to standardize my species by…

Pal
- 23
- 4
2
votes
0 answers
reverse the scale of the test outcome in the LSTM
I am using standardized predictors in training set to train an LSTM model. After I predict the outcome in test set, I need to reverse the predicted score back to the original scale. Normally I could just use the predicted score * SD of the trainning…

user11806155
- 121
- 5
2
votes
2 answers
Why hasn't C++ standardized overloads of algorithms which operate on entire containers?
Standard ISO C++ has a rich algorithm library including plenty of syntactic sugar like std::max_element, std::fill, std::count, etc.
I'm having a hard time understanding why ISO saw fit to standardize many such trivial algorithms, yet not overloads…

Tumbleweed53
- 1,491
- 7
- 13
2
votes
1 answer
What is the correct way to use standardization/normalization in combination with K-Fold Cross Validation?
I have always learned that standardization or normalization should be fit only on the training set, and then be used to transform the test set. So what I'd do is:
scaler = StandardScaler()
scaler.fit_transform(X_train)
scaler.transform(X_test)
Now…

Sievag
- 33
- 5
2
votes
1 answer
RegEx question: standardization of medical terms
I need to detect words as 'bot/hersen/levermetastase' and transform them into 'botmetastase, hersenmetastase, levermetastase'.
But also 'lever/botmetastase' into 'levermetastase, botmetastase'.
So I need to be sure the "word/word/word metastase" is…

LaureAnne
- 23
- 4
2
votes
1 answer
Sklearn.pipeline producing incorrect result
I am trying to construct a pipeline with a StandardScaler() and LogisticRegression(). I get different results when I code it with and without the pipeline. Here's my code without the pipeline:
clf_LR = linear_model.LogisticRegression()
scalar =…

Dona Ray
- 21
- 1
2
votes
1 answer
StandardScaler giving non-uniform standard deviation
My problem setup is as follows: Python 3.7, Pandas version 1.0.3, and sklearn version 0.22.1. I am applying a StandardScaler (to every column of a float matrix) per usual. However, the columns that I get out do not have standard deviation =1, while…

Zhubarb
- 11,432
- 18
- 75
- 114
1
vote
1 answer
Does sklearn.preprocessing StandardScaler converts the data into standard normal distribution?
StandardScaler() from sklearn.preprocessing claims to make mean=0 and std=1. In reality, mean is a very small number close to 0 and similarly, std is close to 1 but not equal. Does it really convert the data into standard normal distribution as it…

arti gupta
- 109
- 1
- 5
1
vote
1 answer
Reverse standardization after removing rows
I have been working with R for about six months now, and so I am still somewhat of a novice with a lot of this. I have a large dataset of 260 columns with 1000 rows and I need to convert the data to standard deviation units and then removing…

jdtrulson
- 15
- 3
1
vote
1 answer
Calculate crude and ajusted rates per subgroup using ageadjust.direct
I am trying to calculate the incidence of a disease per year and per age-category. I also want to apply direct standardization. Im using the function ageadjust.direct (package epitools).
age_cat persondays_individual contactfirst_cat ESPpop…

RvS
- 149
- 8