I have multiple csv files with two columns of values like this:
I use the following python code to calculate the R2 value and plot these data.
import numpy as np
import pandas as pd
import glob
import matplotlib.pyplot as plt
for filepath in glob.iglob(r'*.csv'):
print(filepath)
df = pd.read_csv(filepath)
x_values = df["LMP"]
y_values = df["LMP_old"]
correlation_matrix = np.corrcoef(x_values, y_values)
correlation_xy = correlation_matrix[0,1]
r_squared = correlation_xy**2
plt.scatter(x_values,y_values)
plt.xlabel('Predicted LMP')
plt.ylabel("Actual LMP")
plt.title(r_squared)
plt.xlim(20000, 26000)
plt.ylim(20000, 26000)
x = np.linspace(20000, 26000)
plt.plot(x, x, linestyle='solid')
plt.grid(True)
plt.savefig(filepath+".png")
print(r_squared)
with open(filepath+".txt", "w") as text_file:
print(f"{r_squared}", file=text_file)
But I found the x_values
and y_values
will not be reseted after each loop, but will remember the values from last loop and keep accumulating. What command is needed so that x_values
and y_values
will be independent/reseted after each loop?
Thank you very much.