1

I want to plot 2 different bar chart on a same plot.

The 2 plots I want to present is X.category_name.value_counts().plot and X_sample.category_name.value_counts().plot

I've tried this

plt.figure(figsize=(8,3))
X.category_name.value_counts().plot(kind = 'bar',
                                           color = 'blue', label = 'original',
                                           ylim = [0, upper_bound], width = 0.2,
                                           rot = 0, fontsize = 12)
X_sample.category_name.value_counts().plot(kind = 'bar',
                                           color = 'orange', label = 'sample',
                                           ylim = [0, upper_bound], width = 0.2,
                                           rot = 0, fontsize = 12)

And the plot turned out to be like this: photo

I need to set offsets for each of my X-axies to make those bar separated. But there is no way I can do a

a1.plot(x-offset,y,kind='bar')
a2.plot(x,y,kind='bar')

since x and y were not argument in this case.

Juan
  • 39
  • 1
  • 9

2 Answers2

2

To plot a grouped barchart with pandas you can concatenate the two series and plot the resulting frame with .plot

import numpy as np;np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt

s1 = pd.Series(np.random.choice(list("ABCD"), size=800))
s2 = pd.Series(np.random.choice(list("ABCD"), size=400))

df = pd.concat([s1.value_counts(), s2.value_counts()], axis=1, sort=True)
df.columns = ["original", "sample"]

df.plot(kind="bar")
plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Very nice. But do you know why the x kwarg of `df.plot.bar` does not take effect? – SpghttCd Nov 04 '18 at 22:19
  • I think it does; but there is no need for it since it's the index anyways. – ImportanceOfBeingErnest Nov 04 '18 at 22:21
  • My understanding of the docs is, it's the index as long as no x is defined. But defining it doesn't change the position, it's still plotted over index, hence no effect imo. – SpghttCd Nov 04 '18 at 22:26
  • @SpghttCd I see what you mean. "label or position" refers to a label or position *in the dataframe*, i.e. a column name or column number. – ImportanceOfBeingErnest Nov 04 '18 at 22:32
  • Ok, then that was my misconception; thanks for clarifying. – SpghttCd Nov 04 '18 at 22:34
  • @SpghttCd But your second part of the answer was still correct, right? Using matplotlib with shifted positions is definitely an option one should consider, at least for more complicated cases, like [this one](https://stackoverflow.com/questions/53071143/fitting-bar-width-in-matplotlib-when-indices-contain-differing-number-of-null-va/53105439#53105439) where pandas simply doesn't provide any direct option. – ImportanceOfBeingErnest Nov 04 '18 at 22:47
  • Yes, you're right, i'll edit and delete only the part based on my fault. Thanks again for pointing that out. – SpghttCd Nov 04 '18 at 22:53
0

In pandas bar plots, you are not as free in manually adjusting bar positions as in matplotlib itself. (see also the comments under the answer by @ImportanceOfBeingErnest)
But for the general case, it should be completely sufficient to let pandas automatically arrange the bars and put several data columns, which should be put into a grouped barchart, into a single dataframe.

However, for those who'd like to see how to do the requested idea in matplotlib:

Example:

df = pd.DataFrame(np.array([[4,3,2,1], [2,1,.5,.2]]).T, columns=['A', 'B'], index=list('asdf'))

fig, ax = plt.subplots()

ax.bar(x=np.arange(df.A.size)-.1, height=df.A, color='b', width=.2)
ax.bar(x=np.arange(df.A.size)+.1, height=df.B, color='orange', width=.2)
ax.set_xticks(range(df.A.size))
ax.set_xticklabels(df.index)

Result:

enter image description here

SpghttCd
  • 10,510
  • 2
  • 20
  • 25
  • the second plot is what I want to get. But how can I get my `x` variable from `value_counts`? – Juan Nov 04 '18 at 21:54
  • That should be `X.category_name.value_counts().index`, but this is only needed for labeling, as you can see above. For positioning you need the `np.arange` from its length. It's like `df.A` and `df.A.size` in the example. – SpghttCd Nov 04 '18 at 22:12