normalize input data based on a normalized dataset

Question

I have this code that normalizes a pandas dataframe.

import numpy as np; import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn import preprocessing


df = pd.read_csv('DS/RS_DS/final_dataset.csv')
rec_df = df.drop(['person_id','encounter_id','birthdate','CN','HN','DN','DIAG_DM','DIAG_NONDM','TPN'], axis=1)

#normalize values from 0 to 1
df_val = rec_df.values
min_max_scaler = preprocessing.MinMaxScaler()
df_val_scaled = min_max_scaler.fit_transform(df_val)
df_scaled = pd.DataFrame(df_val_scaled)


df_flask = pd.DataFrame([[42.8,151,73,79,0,1,74]],columns=['weight','height','wc','hc','isMale','isFemale','age'])
df_flask_val = df_flask.values
df_flask_val_scaled = min_max_scaler.fit_transform(df_flask_val)
df_flask_scaled = pd.DataFrame(df_flask_val_scaled)

df_scaled returns a dataframe that is normalized. df_flask is a dataframe that I want to normalize based on df_scaled so I can use it for comparison. df_flask_scaled return all 0, I think it didnt normalize based on the dataframe. is there anyway to normalize the single row df.

or should I add this data to the dataframe then compute normalizing?

your question isn't clear. normalize the single row df ? what's that? — YOLO, Apr 08 '18 at 20:11
When you use already fitted (learnt) models on new data, remember to never use `fit()` or methods that include `'fit'` in their name, like `fit_transform()`. It re-fits the new data, so the old data is forgottern. Only call transform() on new data. — Vivek Kumar, Apr 09 '18 at 09:08
I have two dataframes in the code. rec_df and df_flask which is my single row dataframe @YOLO — Reub, Apr 09 '18 at 17:22

score 0 · Answer 1 · answered Apr 09 '18 at 18:27

0

I think you should do fit and transform separately. This is done to ensure that the distribution of data using in fitting is maintained.

# initialise scaler
min_max_scaler = preprocessing.MinMaxScaler()

# fit here
min_max_scaler.fit(rec_df.values)

# apply transformation
df_val_scaled = min_max_scaler.transform(rec_df.values)
df_flask_val_scaled = min_max_scaler.transform(df_flask_val)

answered Apr 09 '18 at 18:27

YOLO

20,181
5
20
40

yea i tried this but what happend is that df_flask_val_scaled return a df containing all 0's – Reub Apr 10 '18 at 14:01
You need to provide a sample data for me to test. – YOLO Apr 10 '18 at 14:07

normalize input data based on a normalized dataset

1 Answers1