2

UPDATED to include a sample of the data: enter image description here

I have a bar graph that takes the data I have, grouped by starting location (five boroughs of NYC), ad then within that, grouped by data of two different companies X and Y.

As of now the data shows 10 bars all separated equally (each borough and each company), and instead I want it to have 10 bars, but each borough sticks both companies together, so that I have 5 different sections, instead of 10.

Also, I would like to show the percentage overall within the bars if possible, but I don't know how to design it as such.

Lastly, what method would I use to add titles to the axis within the code?

I'm relatively new to Jupyter Notebooks and Data Analytics, so any help would be greatly appreciated, thank you so much!

(Note: It's not actually the five boroughs, but it's close enough - that's just for the purposes of the explanation, as the solution would be the same.)

My current code and graph:

df_stacked = df.groupby(['Starting Location', 'Company Chosen']).count()
# df_stacked['Customer_ID'].plot(kind = 'bar', stacks = df['BREAKDOWN', SUCCESSFUL TRIP])

df_stacked.drop(df_stacked.columns.difference(['Unavailable Scooter', 'BREAKDOWN', 'Successful Trip']), 1, inplace = True)

df_stacked.plot.bar(stacked = True, figsize = (10, 5))

my graph

This is what I would like it to look like: enter image description here

Ben Kluger
  • 82
  • 6
  • 1
    Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Feb 25 '22 at 22:23
  • 1
    See [pandas: How can I group a stacked bar chart?](https://stackoverflow.com/questions/59922701/pandas-how-can-i-group-a-stacked-bar-chart) – JohanC Feb 25 '22 at 22:57

1 Answers1

1

I changed my previous answer to use random data exactly based on your data format (next time it would be helpful to provide your data sample as a text instead of an image!). And I added an example to put a text on the plot with or without an arrow.

Also when you use groupby, are you sure that you want to use count() and not sum() ? now you have the same count for each color on a given bar.

import matplotlib.pyplot as plt  #...

df_stacked = df.groupby(['Starting Location', 'Company Chosen']).count()  # sum()?
df_stacked.drop(df_stacked.columns.difference(['Unavailable Scooter', 'BREAKDOWN', 'Successful Trip']), 
            axis=1, inplace = True) # add axis= to avoid the warning
df_stacked.reset_index(inplace=True) 
df_stacked['X'] = df_stacked['Starting Location'] + '/' + df_stacked['Company Chosen']

fig, ax = plt.subplots(figsize = (10, 5))
ax.bar(df_stacked['X'], df_stacked['Unavailable Scooter'], label='Unavailable Scooter', edgecolor='black',
       color='b', width=0.5)
ax.bar(df_stacked['X'], df_stacked['BREAKDOWN'], label='BREAKDOWN', edgecolor='black', 
       color='r', width=0.5, bottom=df_stacked['Unavailable Scooter'])
ax.bar(df_stacked['X'], df_stacked['Successful Trip'], label='Successful Trip', edgecolor='black', 
       color='g', width=0.5, bottom=df_stacked['Unavailable Scooter']+df_stacked['BREAKDOWN'])

# Example of text/arrow on the (x,y) plot coordinates. Using ax.grid(True, which='both') would help
ax.annotate('my text', xy = (6.5, 4), xytext = (7, 5), arrowprops = 
            dict(facecolor = 'black', width = 0.5, headwidth = 3))

lab = df_stacked['Starting Location'].unique()
lab = [lab[i//2] if i%2==0 else '' for i in range(2*len(lab))]
ax.set_xticks(range(8)) 
ax.set_xticklabels(labels=lab, rotation=0)

ax.tick_params(bottom=False)
ax.set_ylabel('Count')
ax.set(title='My title')

for cont in range(3):
    for i, bars in enumerate(ax.containers[cont].get_children()):  
        bars.set_x(bars.get_x() - (0.75 if i%2 else 0.25))
        
plt.legend()        
plt.show()

Output:
enter image description here

rehaqds
  • 414
  • 2
  • 6
  • Hi, and thank you for the response! I tried out your code, and it works great with the figure that you created. How exactly do the subplots work? Also, how would I apply it to my code? I apologize for not including the data, I realized that soon after posting, and have since updated my data above. Would you please be able to help apply it to my scenario? Thank you so much. – Ben Kluger Feb 27 '22 at 00:14