0

I have a Tensorflow regression model that i have with been working with. I have the model tuned well and getting good results while training. However, when i goto evalute the results are horrible. I did some research and found that i am not normalizing my test features and labels as well so i suspect that is where the problem is. My thought is to normalize the whole dataset before splitting the dataset into train and test sets but i am getting an attribute error that has me stumped.

enter image description here

here is the code sample. Please help :)

#concatenate the surface data and single_downhole_col into a single dataframe
  training_Data =[]
  training_Data = pd.concat([surface_Data, single_downhole_col], axis=1)
  #print('training data shape:',training_Data.shape)
  #print(training_Data.head())

  #normalize the data using keras
  model_normalizer_layer = tf.keras.layers.Normalization(axis=-1)
  model_normalizer_layer.adapt(training_Data)
  normalized_training_Data = model_normalizer_layer(training_Data)

  #convert the data frame to array
  dataset = normalized_training_Data.copy()
  dataset.tail()

  #create a training and test set
  train_dataset = dataset.sample(frac=0.8, random_state=0)
  test_dataset = dataset.drop(train_dataset.index)

  #check the data
  train_dataset.describe().transpose()

  #split features from labels
  train_features = train_dataset.copy()
  test_features = test_dataset.copy()

and if there is any interest in knowing how the normalizer layer is used in the model then please see below

 def build_and_compile_model(data):
    model = keras.Sequential([
        model_normalizer_layer,
        layers.Dense(260, input_dim=401,activation='relu'),
        layers.Dense(80, activation='relu'),
        #layers.Dense(40, activation='relu'),
        layers.Dense(1)
    ])
TheNewGuy
  • 47
  • 5
  • `normalized_training_Data` variable is a Tensor, not a pandas dataframe –  Jul 24 '22 at 03:31
  • Can you elaborate for me? I know what you are saying but what is your recommendation. Any help is greatly appreciated. – TheNewGuy Jul 24 '22 at 04:06
  • If you want to take a copy of the `normalized_training_Data` variable you can do it with tensorflow like this `tf.identity(normalized_training_Data)`. And then to create a train test split you can use sklearn `sklearn.model_selection.train_test_split` –  Jul 24 '22 at 04:14
  • You are treating the `normalized_training_Data ` variable as Pandas DataFrame in reality it is a `tensor` that's my point. I hope you understood :-) –  Jul 24 '22 at 04:17
  • So how would you advise that I create normalized test data for the model.evaluate function? – TheNewGuy Jul 24 '22 at 04:33
  • Do you really want to add a normalization layer to the model? That will make to run that layer each time while training. Why don't you just normalize offline, before the training? Like maybe normalizing the columns of the data frames. Check here https://stackoverflow.com/questions/26414913/normalize-columns-of-pandas-data-frame – ai-py Jul 24 '22 at 07:20
  • my concern was having to normalize data fed to the model in production. This is my first model so i'm looking for guidance on best tactics. So is it best to use pandas to normalize a dataframe prior to training compared to using keras.layers.normalization() as i have??? – TheNewGuy Jul 24 '22 at 16:15
  • @pritishmishra I implemented dataset normalization before training and my results improved drastically! thanks! one thing i am wondering about is how do i scale the models predicted results while in production back to the expected data range??? – TheNewGuy Jul 25 '22 at 16:54

1 Answers1

0

i found that quasimodos suggestion of using normalization of the data set before processing in my model was the ideal solution. It scaled the data 0 to 1 for all columns as expected and allowed me to display the data prior to training to validate it was correct.

For whatever reason the keras.layers.normalization was not working in my case.

x = training_Data.values 
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
training_Data = pd.DataFrame(x_scaled)
# normalize the data using keras
model_normalizer_layer = tf.keras.layers.Normalization(axis=-1)
model_normalizer_layer.adapt(training_Data)
normalized_training_Data = model_normalizer_layer(training_Data)

The only part that i have yet to figure out is how do i scale the predict data from the model back to the original ranges of the column??? i'm sure its simple but i'm stumped.

Furkan Gulsen
  • 1,384
  • 2
  • 12
  • 24
TheNewGuy
  • 47
  • 5