Python: Add error lines/shaded area for linear fit?

Question

I have 2 columns: R_nom = actual value, R_mes = meausred value. I have :

Computed a linear fit
Computed % error
Computed std

I want to do a log-log plot with:

A scatter of the measuered values (DONE)
The linear Fit (DONE)
Error lines above and below the fitted line to indicate the are we expect the value to be in. (TO DO)

This is my code so far:

#@title stack example 
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import matplotlib.pyplot as plt


import numpy as np
import pandas as pd

R_Nom=[100, 100, 100, 100, 500, 500, 500, 500, 500, 800, 800, 800, 800, 1200, 1200, 1200, 1200, 1200, 1200, 5500, 5500,5500,5500,5500]
R_Mes=[102, 101, 98, 97, 512, 517, 492, 490, 470, 820, 825, 812, 790, 1250, 1240, 1230, 1212, 1175, 1153, 5600,5574,5440,5390,5612]

df= pd.DataFrame({"R_Nom": R_Nom, "R_Mes": R_Mes})

 
 # finding the mean, max, min, std using groupby 
mean= df.groupby('R_Nom')['R_Mes'].mean()
max_ = df.groupby('R_Nom')['R_Mes'].max()
min_ = df.groupby('R_Nom')['R_Mes'].min()
std = df.groupby('R_Nom')['R_Mes'].std()

 # for error% however i have to include extra steps - can it be written as above? 
df['Error_percentage'] = abs((df["R_Mes"]-df["R_Nom"])/df["R_Nom"])*100
grouped_df = df.groupby("R_Nom")
max_error = grouped_df['Error_percentage'].max()


 #output the data as a dataframe for plotting 
summary = pd.concat({"mean":mean, 
        "min": min_ ,
        "max": max_ ,
        "std": std, 
        "max_error %": max_error}, axis=1)

summary



 #fit the data


x=np.log10(df['R_Nom'])
y= np.log10(df['R_Mes'])


x_fit = np.linspace(min(x), max(x), 100)


model = np.polyfit(x, y, 1, full=True)
fig = make_subplots(specs=[[{"secondary_y": True}]])


fig.add_trace(
      go.Scatter(x=x_fit, y=x_fit*model[0][0]+model[0][1], name="Fit"),
                secondary_y=False 
  )
fig.add_trace(
      go.Scatter(x=x, y=y, name="Fit", mode="markers"),
                secondary_y=False 
  )

fig.add_trace(
      go.Scatter(x=np.log10(summary.index), y=summary['max_error %'], name="Error % "),
                secondary_y=True 
  )


fig.show()

I have manually added the lines that i mean, 2 lines that encompass the error area, not straight lines but rather change depending on what % or std value is at that precise R_Nom value

Secondly - is there other methods of estimated the error of my data that could be useulf? R_squared etc?

The end goal is to evaluate what is the minimum change of R_Nom that i can detect

You mean like the [this](https://stackoverflow.com/a/43069856/8881141) or [this](https://stackoverflow.com/a/13157955/8881141)? — Mr. T, Oct 20 '20 at 18:30
Looks Great, the only problem is that my error is defined only at the values where i measured it (eg: 100, 500)9 how would i connect the points inbetween (eg 270)? Since the erorr and variance at 100 isn't hte same as at 500, maybe need to do a weighted average of err(100) and err(500)? — Leo, Oct 21 '20 at 07:40
That is now a stats question. I am sure, scipy has a function to generate an interpolation of the error bars for a smoother, nonlinear representation. I tagged `scipy` in, maybe someone else knows what's the best (i.e., correct) way to do this. — Mr. T, Oct 21 '20 at 07:52

Python: Add error lines/shaded area for linear fit?

0 Answers0