I have 2 columns: R_nom = actual value, R_mes = meausred value. I have :
- Computed a linear fit
- Computed % error
- Computed std
I want to do a log-log plot with:
- A scatter of the measuered values (DONE)
- The linear Fit (DONE)
- Error lines above and below the fitted line to indicate the are we expect the value to be in. (TO DO)
This is my code so far:
#@title stack example
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
R_Nom=[100, 100, 100, 100, 500, 500, 500, 500, 500, 800, 800, 800, 800, 1200, 1200, 1200, 1200, 1200, 1200, 5500, 5500,5500,5500,5500]
R_Mes=[102, 101, 98, 97, 512, 517, 492, 490, 470, 820, 825, 812, 790, 1250, 1240, 1230, 1212, 1175, 1153, 5600,5574,5440,5390,5612]
df= pd.DataFrame({"R_Nom": R_Nom, "R_Mes": R_Mes})
# finding the mean, max, min, std using groupby
mean= df.groupby('R_Nom')['R_Mes'].mean()
max_ = df.groupby('R_Nom')['R_Mes'].max()
min_ = df.groupby('R_Nom')['R_Mes'].min()
std = df.groupby('R_Nom')['R_Mes'].std()
# for error% however i have to include extra steps - can it be written as above?
df['Error_percentage'] = abs((df["R_Mes"]-df["R_Nom"])/df["R_Nom"])*100
grouped_df = df.groupby("R_Nom")
max_error = grouped_df['Error_percentage'].max()
#output the data as a dataframe for plotting
summary = pd.concat({"mean":mean,
"min": min_ ,
"max": max_ ,
"std": std,
"max_error %": max_error}, axis=1)
summary
#fit the data
x=np.log10(df['R_Nom'])
y= np.log10(df['R_Mes'])
x_fit = np.linspace(min(x), max(x), 100)
model = np.polyfit(x, y, 1, full=True)
fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(
go.Scatter(x=x_fit, y=x_fit*model[0][0]+model[0][1], name="Fit"),
secondary_y=False
)
fig.add_trace(
go.Scatter(x=x, y=y, name="Fit", mode="markers"),
secondary_y=False
)
fig.add_trace(
go.Scatter(x=np.log10(summary.index), y=summary['max_error %'], name="Error % "),
secondary_y=True
)
fig.show()
I have manually added the lines that i mean, 2 lines that encompass the error area, not straight lines but rather change depending on what % or std value is at that precise R_Nom value
Secondly - is there other methods of estimated the error of my data that could be useulf? R_squared etc?
The end goal is to evaluate what is the minimum change of R_Nom that i can detect