0

I have a .CSV that has 1 column with categorical data and 2 columns with numerical data. I want to create a barplot that has the average of column 2 and the average of column 3 plotted side by side for each category in column 1.

Example data:

data = [    
    ['Bananas', 1, 5],
    ['Apple', 4, 2],
    ['Bananas', 2, 5],
    ['Oranges', 3, 6],
    ['Apple', 4, 2],
    ['Apple', 2, 5],
    ['Oranges', 1, 2],
    ['Apple', 5, 5],
    ['Oranges', 6, 1]
]

df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'])
df.Column2.astype(float)
df.Column3.astype(float)
df

I am able to do a single bar plots just fine and swap out Column2 and Column3.

df = df.sort_values(by='Column1')
plt.figure(figsize=(30,16))
p = sns.barplot(x="Column1", y="Column2", data=df, errorbar=('ci', False), palette="tab20")
plt.xticks(fontsize=14)
plt.yticks(fontsize=20)
plt.bar_label(p.containers[0],size=18,fmt="%.2f")

However, I'm having trouble plotting both Column2 and Column3 on the same graph. I tried the same as above, but subbing in " y1="Column2", y2="Column3" " but get a "Horizontal orientation requires numeric x variable." TypeError. I've also tried using hue

df = df.sort_values(by='Column1')
plt.figure(figsize=(30,16))
p = sns.barplot(x="Column1", y="Column2", hue="Column3", data=df, errorbar=('ci', False), palette="tab20")
plt.xticks(fontsize=14)
plt.yticks(fontsize=20)
plt.bar_label(p.containers[0],size=18,fmt="%.2f")

but the barplot is very messed up, the averages aren't calculated right and the bar labels only show up on one bar.

I've recreated what I was looking for in Excel with the sample data. Ideally, I'd want both bars to be the same color for each fruit too.

Chart

Update: I've got it plotted correctly now

dfm = df.melt(id_vars=["Column1"], var_name="Test1", value_name="Test2")
plt.figure(figsize=(30,16))
p=sns.barplot(data=dfm, x="Column1", y="Test2", hue="Test1", palette="tab20", errorbar=('ci', False))
plt.xticks(fontsize=14)
plt.yticks(fontsize=20)
plt.legend([],[], frameon=False)
for p in p.containers:
    plt.bar_label(p, fmt='%.2f', label_type='edge', padding=1, fontsize=20)

However, I'm still struggling to get each category a unique color while keeping the grouped bars the same color.

  • You need to convert the dataframe to long form, using pandas's `melt` to combine the two columns into one. Then, you can use these as hue. – JohanC Dec 29 '22 at 01:34
  • While the duplicate answers show how to use `melt`, they don't show how to make both bars to be the same colour. As per screenshot – ScottC Dec 29 '22 at 02:37

0 Answers0