1

Is it possible to group the data (for defining x and y variables) for running regression directly in regPlot (or any other seaborn feature)? I am unable to find an inbuilt feature of that sort.

For example, in a column, I have a categorical variable "C", then I am trying to fit a regression line (with x and y) using the median for each category of C. Is there any functionality to do so?

Prabhat
  • 61
  • 7

1 Answers1

1

You need to group by your data with pandas first and then plot it with seaborn. Since you didn't provide your dataframe, I will use a seaborn sample dataset to demonstrate.

import pandas as pd
import seaborn as sns
# load dataframe
df = sns.load_dataset('car_crashes')

The dataframe looks like the following. abbrev column is a category column. I will use total and speeding variable as y and x.

enter image description here

First, use pandas .groupby() method and pass your categorical variable and in the meantime chain another method .median() so that pandas will aggregate your data and return the median for data. Pandas will return a dataframe that looks like the following enter image description here

And then just call the column you want to plot. In our case, they are total and speeding. After, pass your x and y to seaborn .regplot()

# group by
x = df.groupby(['abbrev']).median().speeding
y = df.groupby(['abbrev']).median().total
# plot
sns.regplot(x, y)

enter image description here

steven
  • 2,130
  • 19
  • 38
  • 1
    awesome.. works for me. A follow up question: the confidence interval showing now is created treating the x and y variables obtained after group by as inputs. Can we impose the confidence interval using the ungrouped data set (even though we are running regression using grouped one?) – Prabhat Feb 28 '19 at 05:07
  • For `seaborn` you can only change the size of the CI. If you need some equations for that please see [this web](https://rpubs.com/aaronsc32/regression-confidence-prediction-intervals) and this [post](https://stackoverflow.com/questions/27164114/show-confidence-limits-and-prediction-limits-in-scatter-plot). If my answer is useful for your original question, please consider accepting it as the answer and vote up. Thanks. – steven Feb 28 '19 at 12:49