-1

Dataset:

DataSet

The PV Yield (kWh) is my output. My model is suppose to predict this. This is what I have done. I have attached the image of the dataset. From AirTemp to Zenith is my X and Y is PV Yield(KW/H).

df=pd.read_csv("Data1.csv")

X=df.drop(['Date-PrimaryKey','output-PV Yield (kWh)'],axis=1)

Y=df['output-PV Yield (kWh)']

pca = PCA(n_components=9)

pca.fit(X_train)

X_train = pca.transform(X_train)

pca.fit(X_test)

X_test = pca.transform(X_test)  


#normalizing the input values to fall in -1 to 1

X_train = X_train/180000000.0

X_test = X_test/180000000.0


#Creating Model

model = Sequential()

model.add(Dense(15, input_shape=(9,)))

model.add(Activation('tanh'))


model.add(Dense(11))

model.add(Activation('tanh'))


model.add(Dense(1))


model.summary()

sgd = optimizers.SGD(lr=0.1,momentum=0.2)

model.compile(loss='mean_absolute_error',optimizer=sgd,metrics=['accuracy'])


#Training

model.fit(X_train, train_y, epochs=20, batch_size = 50, validation_data=(X_test, test_y))

My weights are not getting updated. Accuracy is zero in all epochs.

  • 1
    Could you please [edit](https://stackoverflow.com/posts/65563606/edit) your question with better explanation on what are you trying to achieve and how. Also try to add your data in some better form, check out [how to create pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Ruli Jan 04 '21 at 14:03
  • where is `train_y` here , are you extracting it from the data ? – aryanknp Jan 04 '21 at 14:07
  • Accuracy is meaningless for regression tasks. – xdurch0 Jan 05 '21 at 17:15

1 Answers1

0

The model seems OK but there are two problems I can spot fast:

pca = PCA(n_components=9)
pca.fit(X_train)
X_train = pca.transform(X_train)
pca.fit(X_test)
X_test = pca.transform(X_test)

Anything used for transformation of the data must not be fit on testing data. You fit it on train samples and then use it to transform both train and test part. You should assume that you know nothing about data you will be predicting on in production, eg. you know nothing about tomorrows weather, results of sport matches in a month, etc. You wont be able to do so then, so you cant do so during training. Correct way:

pca = PCA(n_components=9)
pca.fit(X_train)
X_train = pca.transform(X_train)
X_test = pca.transform(X_test)

The second very incorrect stuff you have there is here:

#normalizing the input values to fall in -1 to 1
X_train = X_train/180000000.0
X_test = X_test/180000000.0

Of course you want to normalize your data, but this way you will end up with incredibly low decimals in cases where values are low, eg. AlbedoDaily column, and quite high values where are values high, such as SurfacePressure. For such scaling you can use already defined classes such as standard scaler. The code is very simple and each column is treated independently:

from sklearn.preprocessing import StandardScaler
transformer = StandardScaler().fit(X_train)
X_train = transformer.transform(X_train)
X_test = transformer.transform(X_test)

You have not provided or explained what your target variable is and where you get is, there could be other problems in your code I can not see right now.

Ruli
  • 2,592
  • 12
  • 30
  • 40