How to plot a Pandas three level groupby using Plotly

Question

I'm new here, somewhat new to Pandas, and somewhat new to Plotly. I've looked around here and elsewhere for an answer, but haven't found anything that fits (at least not that I understood so).

I have emissions data for 13 pollutants in 20 states for 5 years, with states separated into "Top" or "Bottom" CAV_adoption (Clean Air Vehicle adoption). I'm trying to create line plots for each pollutant using, for example, the CO2_avg_emm_by_CAV groupby listed below. I want x = Year, y = "Total Emmissions", and color = CAV_adoption. Pardon the spelling errors, I'll fix them in my code later. Since this is a three-layer groupby, I'm not sure how to access the data for plotting, or how to merge it back into one df as referenced here. If I could convert the groupby to a normal df with each grouping filled to the appropriate column, this would be easy.

Here is my GitHub with current code and a previous version of nei_by_pollutant data.

From top level of data down:

[In:] (some problems with Index and a weird first line that don't seem to cause errors): nei_by_pollutant.head()

[Out:]

    Pollutant   Symbol  State   CAV_adoption    Year    Concern Total Emmisions UOM
Pollutant   NaN NaN NaN NaN NaN NaN NaN NaN
Pollutant   Carbon Dioxide  CO2 VT  Top 2008.0  GHG 3.631471e+06    Tons
Pollutant   Carbon Monoxide CO  VT  Top 2008.0  GHG 6.671013e+04    Tons
Pollutant   Chromium (VI)   Cr VI   VT  Top 2008.0  Health  6.162756e+01    Lb's
Pollutant   Manganese   Mn  VT  Top 2008.0  Health  3.047231e+01    Lb's
...

[In:]

avg_em_by_CAV = nei_by_pollutant.groupby(["Pollutant", "Year", "CAV_adoption"])["Total Emmisions"].mean()
avg_em_by_CAV.head(n=20)

[Out:]

Pollutant        Year    CAV_adoption
Carbon Dioxide   2008.0  Bottom          2.344468e+07
                         Top             2.364472e+07
                 2011.0  Bottom          2.348518e+07
                         Top             3.917672e+07
                 2014.0  Bottom          2.362971e+07
                         Top             3.952199e+07
                 2017.0  Bottom          2.437697e+07
                         Top             4.048388e+07
                 2020.0  Bottom          2.145373e+07
                         Top             3.653515e+07
Carbon Monoxide  2008.0  Bottom          4.756906e+05
                         Top             6.459687e+05
...

[In:] (This is the data I'd like to plot for each pollutant)

CO2_avg_emm_by_CAV = avg_emm_by_CAV['Carbon Dioxide']
CO2_avg_emm_by_CAV.head()

[Out:]

Year    CAV_adoption
2008.0  Bottom          2.344468e+07
        Top             2.364472e+07
2011.0  Bottom          2.348518e+07
        Top             3.917672e+07
2014.0  Bottom          2.362971e+07
Name: Total Emmisions, dtype: float64

If I can get the data itself into this format, I could plot it using normal plotly formats:

[In:]

avg_em_by_CAV.reset_index()

[Out:]

Pollutant   Year    CAV_adoption    Total Emisions
0   Carbon Dioxide  2008.0  HIGH    2.364472e+07
1   Carbon Dioxide  2008.0  LOW 2.344468e+07
2   Carbon Dioxide  2011.0  HIGH    3.917672e+07
3   Carbon Dioxide  2011.0  LOW 2.348518e+07
4   Carbon Dioxide  2014.0  HIGH    3.952199e+07
... ... ... ... ...
125 Volatile Organic Compounds  2014.0  LOW 3.672417e+04
126 Volatile Organic Compounds  2017.0  HIGH    3.506524e+04
127 Volatile Organic Compounds  2017.0  LOW 2.747528e+04
128 Volatile Organic Compounds  2020.0  HIGH    2.183748e+04
129 Volatile Organic Compounds  2020.0  LOW 1.421041e+04

I was able to import the CSV to RStudio and easily plot this in ggplot. I'd still like to know how to do it in plotly though. — Scott McL, Jun 08 '23 at 23:25
No suggestions here? I really want to be able to plot this using Plotly. — Scott McL, Jun 27 '23 at 16:45
I'm sure you'll get a suggestion if you include a fully reproducible code snippet with sample data. Recreating the issue often takes a lot more time than solving the problem. — vestland, Jun 28 '23 at 06:45

How to plot a Pandas three level groupby using Plotly

0 Answers0