How to make quadruple bar graph from this pandas dataframe?

Question

I am a new coder, and for my class we have an assignment where we are supposed to be making an API call to an outside dataset and then plotting something interesting about the data. I made my API call to a NYC tree census data set. In the data, it shows both tree species, and health status (Good, Fair, Poor, Dead). I want to make a stacked bar plot showing the percentage of health status for each tree. For example, I want one bar for Maple trees, showing that 56% are good, 26% are fair, 13% are poor, and 5% are dead. I'm not really sure how to accomplish all of this. Here is a screenshot showing how my dataset looks. Thanks for any advice!

Dataframe Screenshot

It is not recommended that data be presented as images. It can be toy data and should be presented in text. It is also desirable to present the code that you are working on. This will reduce the burden on the respondent and make it easier to answer. — r-beginners, Jun 19 '21 at 07:53
For this kind of data, it is necessary to determine how many different types of trees there are and focus on the top trees to visualize. Once the tree types are narrowed down, we can calculate the composition ratio of them by health attributes and graph them. — r-beginners, Jun 19 '21 at 07:57

Rob Raymond · Answer 1 · 2021-06-20T13:10:23.733

I've used kaggle as source of data. I did find this as well API I did not use as it is so slow for me
data I've used has no dead trees, just poor, fair and good as status
I have used pandas-percentage-of-total-with-groupby technique for calculating percentages
I prefer plotly to matplotlib for plotting. Both are simple to use
there really are too many bars for this to be a high quality visualisation

get data from API (kaggle)

import kaggle.cli
import sys
import pandas as pd
from pathlib import Path
from zipfile import ZipFile

# search for data set
# sys.argv = [sys.argv[0]] + "datasets list -s \"2015-street-tree-census-tree-data.csv\"".split(" ")
# kaggle.cli.main()

# download data set
sys.argv = [sys.argv[0]] + "datasets download new-york-city/ny-2015-street-tree-census-tree-data".split(" ")
kaggle.cli.main()

zfile = ZipFile("ny-2015-street-tree-census-tree-data.zip")
zfile.infolist()

# use CSV
df = pd.read_csv(zfile.open(zfile.infolist()[0]))

prepare data and plot using `plotly`

import plotly.express as px

spc = 'spc_common'

# aggregate the data and shape it for plotting
dfa = (
    df.groupby([spc, "health"])
    .agg({"tree_id": "count"})
    .groupby(level=spc)
    .apply(lambda x: x / x.sum())
    .unstack("health")
    .droplevel(0, 1)
)

fig = px.bar(
    dfa.reset_index(),
    x=spc,
    y=["Poor", "Fair", "Good"],
    color_discrete_sequence=["red", "blue", "green"],
)
fig.update_layout(yaxis={"tickformat": "%"})

output

matplotlib

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(14, 3))
dfa.plot(kind="bar", stacked=True, ax=ax)

How to make quadruple bar graph from this pandas dataframe?

1 Answers1

get data from API (kaggle)

prepare data and plot using plotly

output

matplotlib

prepare data and plot using `plotly`