1

I'm having difficulties plotting my bar chart after I pivot my data as it can't seem to detect the column that I'm using for the x-axis.

This is the original data:

import pandas as pd

data = {'year': [2014, 2014, 2014, 2015, 2015, 2015, 2016, 2016, 2016, 2017, 2017, 2017, 2018, 2018, 2018, 2019, 2019, 2019, 2020, 2020, 2020, 2021, 2021, 2021],
        'sector': ['Public Sector', 'Private Sector', 'Not in Active Practice', 'Public Sector', 'Private Sector', 'Not in Active Practice', 'Public Sector', 'Private Sector',
                   'Not in Active Practice', 'Public Sector', 'Private Sector', 'Not in Active Practice', 'Public Sector', 'Private Sector', 'Not in Active Practice',
                   'Public Sector', 'Private Sector', 'Not in Active Practice', 'Public Sector', 'Private Sector', 'Not in Active Practice', 'Public Sector', 'Private Sector', 'Not in Active Practice'],
        'count': [861, 531, 2, 877, 606, 66, 899, 682, 112, 882, 765, 167, 960, 804, 203, 943, 834, 243, 1016, 876, 237, 1085, 960, 215]}
df = pd.DataFrame(data)

    year                  sector  count
0   2014           Public Sector    861
1   2014          Private Sector    531
2   2014  Not in Active Practice      2
3   2015           Public Sector    877
4   2015          Private Sector    606
5   2015  Not in Active Practice     66
6   2016           Public Sector    899
7   2016          Private Sector    682
8   2016  Not in Active Practice    112
9   2017           Public Sector    882
10  2017          Private Sector    765
11  2017  Not in Active Practice    167
12  2018           Public Sector    960
13  2018          Private Sector    804
14  2018  Not in Active Practice    203
15  2019           Public Sector    943
16  2019          Private Sector    834
17  2019  Not in Active Practice    243
18  2020           Public Sector   1016
19  2020          Private Sector    876
20  2020  Not in Active Practice    237
21  2021           Public Sector   1085
22  2021          Private Sector    960
23  2021  Not in Active Practice    215

After pivoting the data:

sector  Not in Active Practice  Private Sector  Public Sector
year                                                         
2014                         2             531            861
2015                        66             606            877
2016                       112             682            899
2017                       167             765            882
2018                       203             804            960
2019                       243             834            943
2020                       237             876           1016
2021                       215             960           1085

After tweaking the data to get the columns I want:

sector  Private Sector  Public Sector  Total in Practice
year                                                    
2014               531            861               1392
2015               606            877               1483
2016               682            899               1581
2017               765            882               1647
2018               804            960               1764
2019               834            943               1777
2020               876           1016               1892
2021               960           1085               2045

As you can see, after I have pivoted the data, there is an extra row on top of the year called 'sector'.

sns.barplot(data=df3, x='year', y="Total in Practice")

This is the code that I'm using to plot the graph but python returns with:

<Could not interpret input 'year'>

I've tried using 'sector' instead of 'year' but it returns with the same error.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Yue Min
  • 11
  • 3

2 Answers2

2

I have copied the original data, then do the same process as you described :

import pandas as pd 
import seaborn as sns

mdic = {'year': [2014, 2014, 2014, 2015, 2015, 2015, 2016, 2016, 2016, 2017, 2017, 2017, 
                 2018, 2018, 2018, 2019, 2019, 2019, 2020, 2020, 2020, 2021, 2021, 2021], 
        'sector': ["Public Sector", "Private Sector", "Not in Active Practice",  "Public Sector", "Private Sector", "Not in Active Practice", 
                   "Public Sector", "Private Sector", "Not in Active Practice", "Public Sector", "Private Sector", "Not in Active Practice", 
                   "Public Sector", "Private Sector", "Not in Active Practice",  "Public Sector", "Private Sector", "Not in Active Practice", 
                   "Public Sector", "Private Sector", "Not in Active Practice", "Public Sector", "Private Sector", "Not in Active Practice"],
        'count' : [861, 531, 2, 877, 606, 66, 899, 682, 112, 882, 765, 167, 960, 804, 203, 943, 834, 243, 1016, 876, 237, 1085, 960, 215]}


data = pd.DataFrame(mdic)

data_pivot = data.pivot(index='year', columns='sector', values='count')

df3= data_pivot.drop('Not in Active Practice', axis=1)

df3['Total in Practice'] = df3.sum(axis=1)

Then got the same result as :

df3

sector  Private Sector  Public Sector  Total in Practice
year                                                    
2014               531            861               1392
2015               606            877               1483
2016               682            899               1581
2017               765            882               1647
2018               804            960               1764
2019               834            943               1777
2020               876           1016               1892
2021               960           1085               2045

The reason that you are getting the error is that when you created df3, the colum year is changed to index, here are three solutions:

First is as commented by @tdy

sns.barplot(data=df3.reset_index(), x='year', y='Total in Practice')

Second is:

sns.barplot(data=df3, x=df3.index, y="Total in Practice")

Third is when you do the pivoting add reset_index() and do the sum for specified columns:

data_pivot = data.pivot(index='year', columns='sector', values='count').reset_index()

df3= data_pivot.drop('Not in Active Practice', axis=1)

df3['Total in Practice'] = df3[['Public Sector','Private Sector']].sum(axis=1)

Then you can do bar plot with your code :

ax = sns.barplot(data=df3, x='year', y="Total in Practice")
ax.bar_label(ax.containers[0])

You get the figure :

enter image description here

HMH1013
  • 1,216
  • 2
  • 13
1
  • If you’re going to plot a pivoted (wide form) dataframe, then plot directly with pandas.dataframe.plot, which works with 'year' as the index. Leave the data in long form (as specified in the data parameter documentation) when using seaborn. Both pandas and seaborn use matplotlib.
    • seaborn doesn't recognize 'year' because it's in the dataframe index, it's not a column, as needed by the API.
  • It's not necessary to calculate a total column because this can be added to the top of stacked bars with matplotlib.pyplot.bar_label.
    • See this answer for a thorough explanation of using .bar_label.

Manage the DataFrame

# select the data to not include 'Not in Active Practice' 
df = df[df.sector.ne('Not in Active Practice')]

Plot long df with seaborn

# plot the data in long form
fig, ax = plt.subplots(figsize=(9, 7))
sns.histplot(data=df, x='year', weights='count', hue='sector', multiple='stack', bins=8, discrete=True, ax=ax)

# iterate through the axes containers to add bar labels
for c in ax.containers:
    # add the section label to the middle of each bar
    ax.bar_label(c, label_type='center')
# add the label for the total bar length by adding only the last container to the top of the bar
_ = ax.bar_label(ax.containers[-1])

enter image description here

Plot pivoted (wide) df with pandas.DataFrame.plot

# pivot the dataframe
dfp = df.pivot(index='year', columns='sector', values='count')

# plot the dataframe
ax = dfp.plot(kind='bar', stacked=True, rot=0, figsize=(9, 7))

# add labels
for c in ax.containers:
    ax.bar_label(c, label_type='center')
_ = ax.bar_label(ax.containers[-1])

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158