Adding a best fitting line to a series of plots

Question

Data:

import pandas as pd

data = {'PctUrban': [1.460065484046936, 1.460065484046936, 1.460065484046936, 1.460065484046936, 1.460065484046936, 1.460065484046936, 1.460065484046936, 1.460065484046936, 1.460065484046936, 1.460065484046936],
        'huc6': [30201, 30201, 30201, 30201, 30201, 30201, 30201, 30201, 30201, 30201], 'contam_gro': ['Butyltins', 'Chlordanes', 'Chlorobenzenes', 'DDTs', 'Dieldrins', 'Endolsulfans', 'HCHs', 'Other', 'PAHs', 'PBBs'],
        'matrix': ['SED', 'SED', 'SED', 'SED', 'SED', 'SED', 'SED', 'SED', 'SED', 'SED'], 'sample_typ': ['sediment', 'sediment', 'sediment', 'sediment', 'sediment', 'sediment', 'sediment', 'sediment', 'sediment', 'sediment'],
        'HUC_tot_conc_avg': [0.5375, 0.1654999999999999, 0.488499999999625, 0.373, 0.2996249999995875, 0.0075, 0.005, 0.0032083333333375, 81.17823809514286, 0.0],
        'region': ['South Atlantic', 'South Atlantic', 'South Atlantic', 'South Atlantic', 'South Atlantic', 'South Atlantic', 'South Atlantic', 'South Atlantic', 'South Atlantic', 'South Atlantic']}

Sediment = pd.DataFrame(data, index=range(0, 19, 2))

I am creating a series of plot in a loop. Contam_gro column has 13 unique values - the loop grabs a unique value and creates a scatterplot. I need to add a trendline(or best fitting line) to each plot.

The code for a loop to create the scatter plots:

for contam in Contam_list:
    Sediment.loc[(Sediment['contam_gro'] == contam)].plot(x='PctUrban', y='HUC_tot_conc_avg',     kind='scatter', figsize=(7,4))
    plt.title(contam, fontsize=18) #Labeling titel
    plt.xlabel('Urban %', fontsize=12) #Labeling x-axis 
    plt.ylabel('Concentration', fontsize=12) #Labeling y-axis 
    
plt.show()

I have to use matplotlib for this. I am not sure how and where to implement the trendline code into the loop.

I tried adding this piece:

Sediment.loc[(Sediment['contam_gro'] == contam)].plot(Sediment['PctUrban'],p(Sediment['PctUrban']),"r--", x='PctUrban', y='HUC_tot_conc_avg', kind='scatter', figsize=(7,4),
                                                                      z = np.polyfit(Sediment['PctUrban'], Sediment['HUC_tot_conc_avg'], 1), p = np.poly1d(z))

Error is "dict() got multiple values for keyword argument 'x'"

`Sediment` is filtered inside the loop, so why aren't you filtering it when calling `np.polyfit`? — BigBen, Aug 24 '23 at 16:00
`.plot(Sediment_drop2['PctUrban'],p(Sediment_drop2['PctUrban']),...)` makes no sense. Please review [`pandas.DataFrame.plot.html`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html) for appropriate parameters. You are overlapping positional and keyword parameters. `p` , `Contam_list`, and `Sediment_drop2` and not defined. This code can't be run to produce the given error. — Trenton McKinney, Aug 24 '23 at 19:21
1. `import seaborn as sns` 2. `g = sns.lmplot(data=Sediment, x='PctUrban', y='HUC_tot_conc_avg', col='contam_gro', col_wrap=5, height=3)` or `g = sns.lmplot(data=Sediment, x='PctUrban', y='HUC_tot_conc_avg', hue='contam_gro', height=4)` — Trenton McKinney, Aug 24 '23 at 19:33
@TrentonMcKinney Thank you, I know, Seaborn probably would be an easier way, I have to use matplot for it — ArianaMo, Aug 25 '23 at 19:42

Adding a best fitting line to a series of plots

0 Answers0