0

I have a dataset with a few records about some crop production by year. So I am visualizing the top produced crop by each year in a stacked bar chart. Dataset I have used can be found in kaggle PMFBY Coverage.csv.

Here is my code.

# Top Crop by year
plt.figure(figsize=(12, 6))

df_crg_[df_crg_.year==2018].groupby('cropName').size().nlargest(5).plot(kind='barh', color='red', label='2018')
df_crg_[df_crg_.year==2019].groupby('cropName').size().nlargest(5).plot(kind='barh', color='green', label='2019')
df_crg_[df_crg_.year==2020].groupby('cropName').size().nlargest(5).plot(kind='barh', color='blue', label='2020')
df_crg_[df_crg_.year==2021].groupby('cropName').size().nlargest(5).plot(kind='barh', color='maroon', label='2021')

plt.legend(loc="upper right")
plt.xlabel('Total Production Time')
plt.title('Top Crop by year')
plt.show()

And this was the output enter image description here

Now if you look at the graph you would notice the stacked bar chart legends are revered, it is showing 2021 status first instead of 2018. So I want to reverse this order of representation.

I found one solution for this question but I don't know how to apply it, as it is assigning plotting commands to one variable but in my case, there are four plotting commands.


Only this answer would do, but if know and can answer any other method of extracting top produced crop by year then that would be great. If you notice here I am manually going through each year then extracting that year's top crop. I tried doing it with groupby but I wasn't able to get the answer.

Thanks

Darkstar Dream
  • 1,649
  • 1
  • 12
  • 23
  • 1
    It looks like you're plotting several bar plots on top of each other, not stacking them. Also, the 4 plots probably use different orderings of the crops, generating a weird mix. This means that stacking these bars isn't feasible. You should first try to plot the bars on 4 different subplots to serve as a reference. – JohanC Jan 27 '22 at 22:29
  • `pd.crosstab(df['sssyName.year'], df['cropName']).T.sort_values(2018, ascending=False)` Your plot seems to assume the nlargest is the same for each year. – Trenton McKinney Jan 27 '22 at 22:42
  • Something more like `pd.crosstab(df['sssyName.year'], df['cropName']).T.sort_values(2018, ascending=False).head().plot(kind='barh', stacked=True)`, but this sorts by 2018 – Trenton McKinney Jan 27 '22 at 22:44
  • Thanks, @TrentonMcKinney, but actually I want n top-produced crop each year. Do you have any idea for that? – Darkstar Dream Jan 27 '22 at 22:51
  • That's why I said "Something more like" and "but this sorts by 2018" – Trenton McKinney Jan 27 '22 at 22:52
  • yeah, but this won't work, thanks anyway – Darkstar Dream Jan 27 '22 at 22:56

1 Answers1

2

First off, the same 5 crops need to be selected each year. Otherwise, you can't have a fixed ordering on the y-axis.

The easiest way to get a plot with the overall 5 most-frequent crops, is seaborn's sns.countplot and limiting to the 5 largest. Note that seaborn is strongly objected to stacked bar plots, so you'll get "dodged" bars (which are easier to compare, year by year, and crop by crop):

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

df = pd.read_csv('PMFBY coverage.csv')

sns.set_style('white')
order = df.groupby('cropName').size().sort_values(ascending=False)[:5].index
plt.figure(figsize=(12, 5))
ax = sns.countplot(data=df, y='cropName', order=order, hue='year')
for bars in ax.containers:
    ax.bar_label(bars, fmt='%.0f', label_type='edge', padding=2)
sns.despine()
plt.tight_layout()
plt.show()

sns.countplot example

With pandas, you can get stacked bars, but you need a bit more manipulation:

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

df = pd.read_csv('PMFBY coverage.csv')
sns.set_style('white')
order = df.groupby('cropName').size().sort_values(ascending=False)[:5].index
df_5_largest = df[df['cropName'].isin(order)]
df_5_largest_year_count = df_5_largest.groupby(['cropName', 'year']).size().unstack('year').reindex(order)
ax = df_5_largest_year_count.plot.barh(stacked=True, figsize=(12, 5))
ax.invert_yaxis()
for bars in ax.containers:
    ax.bar_label(bars, fmt='%.0f', label_type='center', color='white', fontsize=16)
sns.despine()
plt.tight_layout()
plt.show()

pandas stacked bars

Now, compare this with how the bars would look like if you'd consider the 5 largest crops of each individual year. Notice how the crops and their order is different each year. How would you combine such information to a single plot?

sns.set_style('white')
fig, axs = plt.subplots(2, 2, figsize=(14, 8))

df[df.year == 2018].groupby('cropName').size().nlargest(5).plot(kind='barh', color='C0', title='2018', ax=axs[0, 0])
df[df.year == 2019].groupby('cropName').size().nlargest(5).plot(kind='barh', color='C1', title='2019', ax=axs[0, 1])
df[df.year == 2020].groupby('cropName').size().nlargest(5).plot(kind='barh', color='C2', title='2020', ax=axs[1, 0])
df[df.year == 2021].groupby('cropName').size().nlargest(5).plot(kind='barh', color='C3', title='2021', ax=axs[1, 1])
for ax in axs.flat:
    ax.bar_label(ax.containers[0], fmt='%.0f', label_type='edge', padding=2)
    ax.margins(x=0.1)
sns.despine()
plt.tight_layout()
plt.show()

bars for 5 largest crops for each year

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Yeah, you are right, I corrected my single graph to subplots, thanks to your earlier comment above. Thanks for this awesome answer btw can you tell what the `order` command is doing? Is it also selecting top 5 with the same order as of grouped and sliced data above. – Darkstar Dream Jan 27 '22 at 23:41
  • I am getting an error `AttributeError: 'AxesSubplot' object has no attribute 'bar_label'` with your code. – Darkstar Dream Jan 27 '22 at 23:45
  • You seem to be running an old matplotlib version. `bar_label` is new since `matplotlib 3.4`. The code will still work if you remove those calls. – JohanC Jan 27 '22 at 23:46
  • `order=` in `sns.countplot` (and similar functions) here sets a list of y-values. That list defines both the order and the subset of all possible y-values. In this case, the list is created from the index of the sorted groupby-by dataframe. – JohanC Jan 27 '22 at 23:49
  • A bit of an off-topic here: If you were interested in the order of legend entries in a _vertical_ barchart, use `handles, labels = ax.get_legend_handles_labels(); ax.legend(reversed(handles), reversed(labels))` – Vojta F Apr 12 '23 at 08:47