I'm new here, somewhat new to Pandas, and somewhat new to Plotly. I've looked around here and elsewhere for an answer, but haven't found anything that fits (at least not that I understood so).
I have emissions data for 13 pollutants in 20 states for 5 years, with states separated into "Top" or "Bottom" CAV_adoption (Clean Air Vehicle adoption). I'm trying to create line plots for each pollutant using, for example, the CO2_avg_emm_by_CAV groupby listed below. I want x = Year, y = "Total Emmissions", and color = CAV_adoption. Pardon the spelling errors, I'll fix them in my code later. Since this is a three-layer groupby, I'm not sure how to access the data for plotting, or how to merge it back into one df as referenced here. If I could convert the groupby to a normal df with each grouping filled to the appropriate column, this would be easy.
Here is my GitHub with current code and a previous version of nei_by_pollutant data.
From top level of data down:
[In:] (some problems with Index and a weird first line that don't seem to cause errors):
nei_by_pollutant.head()
[Out:]
Pollutant Symbol State CAV_adoption Year Concern Total Emmisions UOM
Pollutant NaN NaN NaN NaN NaN NaN NaN NaN
Pollutant Carbon Dioxide CO2 VT Top 2008.0 GHG 3.631471e+06 Tons
Pollutant Carbon Monoxide CO VT Top 2008.0 GHG 6.671013e+04 Tons
Pollutant Chromium (VI) Cr VI VT Top 2008.0 Health 6.162756e+01 Lb's
Pollutant Manganese Mn VT Top 2008.0 Health 3.047231e+01 Lb's
...
[In:]
avg_em_by_CAV = nei_by_pollutant.groupby(["Pollutant", "Year", "CAV_adoption"])["Total Emmisions"].mean()
avg_em_by_CAV.head(n=20)
[Out:]
Pollutant Year CAV_adoption
Carbon Dioxide 2008.0 Bottom 2.344468e+07
Top 2.364472e+07
2011.0 Bottom 2.348518e+07
Top 3.917672e+07
2014.0 Bottom 2.362971e+07
Top 3.952199e+07
2017.0 Bottom 2.437697e+07
Top 4.048388e+07
2020.0 Bottom 2.145373e+07
Top 3.653515e+07
Carbon Monoxide 2008.0 Bottom 4.756906e+05
Top 6.459687e+05
...
[In:] (This is the data I'd like to plot for each pollutant)
CO2_avg_emm_by_CAV = avg_emm_by_CAV['Carbon Dioxide']
CO2_avg_emm_by_CAV.head()
[Out:]
Year CAV_adoption
2008.0 Bottom 2.344468e+07
Top 2.364472e+07
2011.0 Bottom 2.348518e+07
Top 3.917672e+07
2014.0 Bottom 2.362971e+07
Name: Total Emmisions, dtype: float64
If I can get the data itself into this format, I could plot it using normal plotly formats:
[In:]
avg_em_by_CAV.reset_index()
[Out:]
Pollutant Year CAV_adoption Total Emisions
0 Carbon Dioxide 2008.0 HIGH 2.364472e+07
1 Carbon Dioxide 2008.0 LOW 2.344468e+07
2 Carbon Dioxide 2011.0 HIGH 3.917672e+07
3 Carbon Dioxide 2011.0 LOW 2.348518e+07
4 Carbon Dioxide 2014.0 HIGH 3.952199e+07
... ... ... ... ...
125 Volatile Organic Compounds 2014.0 LOW 3.672417e+04
126 Volatile Organic Compounds 2017.0 HIGH 3.506524e+04
127 Volatile Organic Compounds 2017.0 LOW 2.747528e+04
128 Volatile Organic Compounds 2020.0 HIGH 2.183748e+04
129 Volatile Organic Compounds 2020.0 LOW 1.421041e+04