Questions tagged [standardization]

Standardization, or normalization, is a process used to make a vector of real number values have a mean of zero and a standard deviation of one. Also called standard scores or z-scores.

72 questions
104
votes
9 answers

Have there ever been silent behavior changes in C++ with new standard versions?

(I'm looking for an example or two to prove the point, not a list.) Has it ever been the case that a change in the C++ standard (e.g. from 98 to 11, 11 to 14 etc.) changed the behavior of existing, well-formed, defined-behavior user code - silently?…
einpoklum
  • 118,144
  • 57
  • 340
  • 684
3
votes
1 answer

Standardization or scaling of categorical variables

I am fairly new to data science. I am working on use-case of predicting sales demand using linear regression based on product no and store no as predictor. There can be many stores and products with numeric values. Do I need to standardize or scales…
3
votes
1 answer

How to implement PySpark StandardScaler on subset of columns?

I want to use pyspark StandardScaler on 6 out of 10 columns in my dataframe. This will be part of a pipeline. The inputCol parameter seems to expect a vector, which I can pass in after using VectorAssembler on all my features, but this scales all 10…
Insu Q
  • 403
  • 6
  • 13
3
votes
1 answer

Python: 'StandardScaler' object has no attribute '_validate_data'

I recently updated my sklearn. However, since the upgrade I'm getting the error "'StandardScaler' object has no attribute '_validate_data'". The following is a snippet of the code: Xs = pd.DataFrame([[10,20], [20,30], [30,40], [40,50]]) scalerx =…
3
votes
4 answers

Why don't the authors of the C99 standard specify a standard for the size of floating point types?

I noticed on Windows and Linux x86, float is a 4-byte type, double is 8, but long double is 12 and 16 on x86 and x86_64 respectively. C99 is supposed to be breaking such barriers with the specific integral sizes. The initial technological limitation…
j riv
  • 3,593
  • 6
  • 39
  • 54
2
votes
1 answer

Rarefy my species data based on individuals

I am new to R so I apologize in advance. I sampled moths along an elevational gradient with a total of 8 different sites. I had unequal sampling nights per elevation. Because of my unequal sampling nights, I want to standardize my species by…
Pal
  • 23
  • 4
2
votes
0 answers

reverse the scale of the test outcome in the LSTM

I am using standardized predictors in training set to train an LSTM model. After I predict the outcome in test set, I need to reverse the predicted score back to the original scale. Normally I could just use the predicted score * SD of the trainning…
user11806155
  • 121
  • 5
2
votes
2 answers

Why hasn't C++ standardized overloads of algorithms which operate on entire containers?

Standard ISO C++ has a rich algorithm library including plenty of syntactic sugar like std::max_element, std::fill, std::count, etc. I'm having a hard time understanding why ISO saw fit to standardize many such trivial algorithms, yet not overloads…
Tumbleweed53
  • 1,491
  • 7
  • 13
2
votes
1 answer

What is the correct way to use standardization/normalization in combination with K-Fold Cross Validation?

I have always learned that standardization or normalization should be fit only on the training set, and then be used to transform the test set. So what I'd do is: scaler = StandardScaler() scaler.fit_transform(X_train) scaler.transform(X_test) Now…
2
votes
1 answer

RegEx question: standardization of medical terms

I need to detect words as 'bot/hersen/levermetastase' and transform them into 'botmetastase, hersenmetastase, levermetastase'. But also 'lever/botmetastase' into 'levermetastase, botmetastase'. So I need to be sure the "word/word/word metastase" is…
LaureAnne
  • 23
  • 4
2
votes
1 answer

Sklearn.pipeline producing incorrect result

I am trying to construct a pipeline with a StandardScaler() and LogisticRegression(). I get different results when I code it with and without the pipeline. Here's my code without the pipeline: clf_LR = linear_model.LogisticRegression() scalar =…
2
votes
1 answer

StandardScaler giving non-uniform standard deviation

My problem setup is as follows: Python 3.7, Pandas version 1.0.3, and sklearn version 0.22.1. I am applying a StandardScaler (to every column of a float matrix) per usual. However, the columns that I get out do not have standard deviation =1, while…
Zhubarb
  • 11,432
  • 18
  • 75
  • 114
1
vote
1 answer

Does sklearn.preprocessing StandardScaler converts the data into standard normal distribution?

StandardScaler() from sklearn.preprocessing claims to make mean=0 and std=1. In reality, mean is a very small number close to 0 and similarly, std is close to 1 but not equal. Does it really convert the data into standard normal distribution as it…
1
vote
1 answer

Reverse standardization after removing rows

I have been working with R for about six months now, and so I am still somewhat of a novice with a lot of this. I have a large dataset of 260 columns with 1000 rows and I need to convert the data to standard deviation units and then removing…
jdtrulson
  • 15
  • 3
1
vote
1 answer

Calculate crude and ajusted rates per subgroup using ageadjust.direct

I am trying to calculate the incidence of a disease per year and per age-category. I also want to apply direct standardization. Im using the function ageadjust.direct (package epitools). age_cat persondays_individual contactfirst_cat ESPpop…
RvS
  • 149
  • 8
1
2 3 4 5