1

I'm new here, somewhat new to Pandas, and somewhat new to Plotly. I've looked around here and elsewhere for an answer, but haven't found anything that fits (at least not that I understood so).

I have emissions data for 13 pollutants in 20 states for 5 years, with states separated into "Top" or "Bottom" CAV_adoption (Clean Air Vehicle adoption). I'm trying to create line plots for each pollutant using, for example, the CO2_avg_emm_by_CAV groupby listed below. I want x = Year, y = "Total Emmissions", and color = CAV_adoption. Pardon the spelling errors, I'll fix them in my code later. Since this is a three-layer groupby, I'm not sure how to access the data for plotting, or how to merge it back into one df as referenced here. If I could convert the groupby to a normal df with each grouping filled to the appropriate column, this would be easy.

Here is my GitHub with current code and a previous version of nei_by_pollutant data.

From top level of data down:

[In:] (some problems with Index and a weird first line that don't seem to cause errors): nei_by_pollutant.head()

[Out:]

    Pollutant   Symbol  State   CAV_adoption    Year    Concern Total Emmisions UOM
Pollutant   NaN NaN NaN NaN NaN NaN NaN NaN
Pollutant   Carbon Dioxide  CO2 VT  Top 2008.0  GHG 3.631471e+06    Tons
Pollutant   Carbon Monoxide CO  VT  Top 2008.0  GHG 6.671013e+04    Tons
Pollutant   Chromium (VI)   Cr VI   VT  Top 2008.0  Health  6.162756e+01    Lb's
Pollutant   Manganese   Mn  VT  Top 2008.0  Health  3.047231e+01    Lb's
...

[In:]

avg_em_by_CAV = nei_by_pollutant.groupby(["Pollutant", "Year", "CAV_adoption"])["Total Emmisions"].mean()
avg_em_by_CAV.head(n=20)

[Out:]

Pollutant        Year    CAV_adoption
Carbon Dioxide   2008.0  Bottom          2.344468e+07
                         Top             2.364472e+07
                 2011.0  Bottom          2.348518e+07
                         Top             3.917672e+07
                 2014.0  Bottom          2.362971e+07
                         Top             3.952199e+07
                 2017.0  Bottom          2.437697e+07
                         Top             4.048388e+07
                 2020.0  Bottom          2.145373e+07
                         Top             3.653515e+07
Carbon Monoxide  2008.0  Bottom          4.756906e+05
                         Top             6.459687e+05
...

[In:] (This is the data I'd like to plot for each pollutant)

CO2_avg_emm_by_CAV = avg_emm_by_CAV['Carbon Dioxide']
CO2_avg_emm_by_CAV.head()

[Out:]

Year    CAV_adoption
2008.0  Bottom          2.344468e+07
        Top             2.364472e+07
2011.0  Bottom          2.348518e+07
        Top             3.917672e+07
2014.0  Bottom          2.362971e+07
Name: Total Emmisions, dtype: float64

If I can get the data itself into this format, I could plot it using normal plotly formats:

[In:]

avg_em_by_CAV.reset_index()

[Out:]

Pollutant   Year    CAV_adoption    Total Emisions
0   Carbon Dioxide  2008.0  HIGH    2.364472e+07
1   Carbon Dioxide  2008.0  LOW 2.344468e+07
2   Carbon Dioxide  2011.0  HIGH    3.917672e+07
3   Carbon Dioxide  2011.0  LOW 2.348518e+07
4   Carbon Dioxide  2014.0  HIGH    3.952199e+07
... ... ... ... ...
125 Volatile Organic Compounds  2014.0  LOW 3.672417e+04
126 Volatile Organic Compounds  2017.0  HIGH    3.506524e+04
127 Volatile Organic Compounds  2017.0  LOW 2.747528e+04
128 Volatile Organic Compounds  2020.0  HIGH    2.183748e+04
129 Volatile Organic Compounds  2020.0  LOW 1.421041e+04
Scott McL
  • 11
  • 3
  • I was able to import the CSV to RStudio and easily plot this in ggplot. I'd still like to know how to do it in plotly though. – Scott McL Jun 08 '23 at 23:25
  • No suggestions here? I really want to be able to plot this using Plotly. – Scott McL Jun 27 '23 at 16:45
  • I'm sure you'll get a suggestion if you include a fully reproducible code snippet with sample data. Recreating the issue often takes a lot more time than solving the problem. – vestland Jun 28 '23 at 06:45

0 Answers0