I am currently working on a machine learning project and have encountered a dilemma regarding the scaling of test data. I understand that when scaling features, we fit the scalar object using the training data and then transform both the training and test data using the same scalar object.
However, I have a concern regarding potential data leakage when scaling the test data. As the scalar object is based on the statistical properties (e.g., mean, standard deviation) calculated from the training data, I am unsure how accurately it can scale the test data without incorporating information from the test set.
Could someone please clarify whether there is a risk of data leakage when transforming the test data with the same scalar object used for the training data? If so, what would be the best approach to mitigate this risk and ensure a reliable evaluation of the model's performance?
I appreciate any insights or guidance from the community to help address my confusion and ensure proper scaling practices in my machine learning project.
Thank you in advance for your help and expertise.