3

I have a pandas dataframe with 26 columns of numerical data. I want to represent the mean of each column in a barplot with 26 bars. This is easy to do with pandas plotting function: df.plot(kind = 'bar'). However, the results are ugly and the column labels are often truncated, i.e.:

Truncated labels plot from pandas

I'd like to do this with seaborn instead, but can't seem to find a way no matter how hard I look. Surely there's an easy way to do a simple barplot of column averages? Thanks.

Lodore66
  • 1,125
  • 4
  • 16
  • 34
  • 2
    Please provide [sample data](https://stackoverflow.com/q/20109391/1422451). Please show attempted code block. Please screenshot undesired plot. – Parfait May 14 '18 at 19:01
  • You can always try: `plt.tight_layout()` before `plt.show()`. Also consider using `kind='barh'` – Anton vBR May 14 '18 at 19:11
  • Your data is in wide format. Consider reshaping to long format and have indicators *female, male, etc.* in their own columns apart from numeric value then plot by categories. – Parfait May 14 '18 at 19:14

4 Answers4

4

You can try something like this:

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

fig = df.mean().plot(kind='bar')
plt.margins(0.02)
plt.ylabel('Your y-label')
plt.xlabel('Your x-label')
fig.set_xticklabels(df.columns, rotation = 45, ha="right")
plt.show()

enter image description here

Joe
  • 12,057
  • 5
  • 39
  • 55
3

If anyone finds this by a search, the easiest solution I've found (I'm OP) is to use use the pandas.melt() function. This concatenates all the columns into a single column, but adds a second column that preserves the column title adjacent to each value. This dataframe can be passed directly to seaborn.

Lodore66
  • 1,125
  • 4
  • 16
  • 34
2

You can use sns.barplot - especially for horizontal barplots more suitable for so many categories - like this:

import seaborn as sns

df = pd.DataFrame({'x': [0, 1], 'y': [2, 3]})
unstacked = df.unstack().to_frame()
sns.barplot(
    y=unstacked.index.get_level_values(0),
    x=unstacked[0]);

enter image description here

Ami Tavory
  • 74,578
  • 11
  • 141
  • 185
  • Thanks, this is useful. However, I lose the error bars when I represent the dataframe as a series like this. Anyone know how I can keep them? – Lodore66 May 14 '18 at 19:53
2

df = pd.DataFrame({'x': [0, 1], 'y': [2, 3]})

sns.barplot(x = df.mean().index, y = df.mean())

plt.show()

Rajesh Ve
  • 21
  • 1