4

Using Pandas and Python I am trying to achieve a barplot.

The data is imported from a CSV into a dataframe in Pandas.

There are several groups of bars which are grouped according to a specific row value in one of the columns, which are the categories: A, B, C, D, E. These categories are given by the values in a column in the CSV called category. As we can see from the attached picture, A corresponds to the light grey, B to the lighest blue and so on until E.

Each of the rows have two columns (COLUMN1, COLUMN2) that are relevant for the y-values (from 1 to 5, this give the heights of the bars) in the barplot. So looking at the attached picture: from column 1 the not-semi-transparent-bars are constructed and from COLUMN2 the semi-transparent-bars are constructed.

Barplot with groups of bars including overlay bars

EDIT

Here is how the layout of the data in the imported CSV/dataframe is:

Category    COLUMN1         COLUMN2     Month    
A          0.2               3          Jan   
B          0.3               5          Jan 
C          0.7               4          Jan
D          0.4               3          Jan
E          0.8               5          Jan
A          0.3               4          Feb
B          0.75             4.5         Feb

END EDIT

I have managed to make the plot with the groupings of 5 bars of the not-semi-transparent bars, but I have no clue how to get the semi-transparent bars into the same plot as is shown in the attached picture. Any suggestions? My problem is that I need to add the values from COLUMN2 as semi-transparent bars (they don't have to be semi-transparent, but it could be the easiest for beeing able to distinguish).

This is the code I have so far:

import pandas as pd


df=pd.read_csv("filename_for_import.csv", 
               names=["Category", "COLUMN1", "COLUMN2", "Month"], 
encoding="UTF-8")

order = ['Jan', 'Feb', 'Mar', 'Apr']



d = pd.pivot_table(df, index='Month', columns='Category', 
values='COLUMN1').loc[order].plot(kind='bar', grid='True')

EDIT 2

Just realized a potential issue depending on the data used. Adjusted the value of A in COLUMN1 to be bigger than B on the first row, to illustrate an example.

Category    COLUMN1         COLUMN2     Month    
A          4.5                3         Jan   
B          0.3               5          Jan 
C          0.7               4          Jan
D          0.4               3          Jan
E          0.8               5          Jan
A          0.3               4          Feb
B          0.75             4.5         Feb

Discovered that the non-semi-transparent-bars are placed on top, making it impossible to see the the semi-transparent-bars in the cases when they are lower than the non-semi-transparent bars. It seems like changing the order in the code provided below by Parfait. Also switching between col/ax 1 and 2 in the same code doesn't seem to make any difference. It appears as if the non-transparent bars are always placed on top no matter what. Is there some way to override this?

  • 2
    No one has your file. Mock up some data that mimics its structure. Heres's a recent example that nicely demonstrates how to do that. https://stackoverflow.com/questions/48831722/add-columns-to-pandas-dataframe-on-different-conditions – Paul H Feb 16 '18 at 20:26

1 Answers1

2

Consider twiny to overlay the shorter COLUMN1 plot over larger COLUMN2 pivot plot. And use alpha to adjust opaqueness. As for the pivot_table, use reindex to adjust month values.

from io import StringIO
import pandas as pd
import matplotlib.pyplot as plt

txt = '''Category    COLUMN1         COLUMN2     Month    
A          0.5               3          Jan   
0          0.3               5          Jan 
C          0.7               4          Jan
D          0.4               3          Jan
E          0.8               5          Jan
A          0.3               4          Feb
B          0.75             4.5         Feb'''

df = pd.read_table(StringIO(txt), sep="\s+")

order = ['Jan', 'Feb', 'Mar', 'Apr']

fig, ax = plt.subplots()
ax2 = ax.twiny()

col1 = pd.pivot_table(df,index='Month',columns='Category',values='COLUMN1').reindex(order)
col1.plot(kind='bar', ax=ax)

col2 = pd.pivot_table(df,index='Month',columns='Category',values='COLUMN2').reindex(order)
col2.plot(kind='bar', ax=ax2, alpha=0.5, legend=False)
ax2.xaxis.set_visible(False)

plt.show()  

Plot Output

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • This works great, however there is one detail I noticed. In the cases when the semi-transparent bars are less high than the corresponding non-semi-transparent bars, they semi-transparent-bars are completely out of sight. That leads me to assume the semi-transparent bars are put first in the background, and the non-semi-transparent are put on top. How can this be reversed, so that the semi-transparent-bars appear on top? – BoroBorooooooooooooooooooooooo Feb 17 '18 at 09:17
  • You can't mix the opaqueness between bars, only between whole plots. You can't have one category *A* run opposite transparency depending on data than other categories in same plot. You may need to re-order your data to have solid colors always be lower values. Maybe even run two different figures for filtered cases: COLUMN1 > COLUMN2 and COLUMN1 < COLUMN2. – Parfait Feb 17 '18 at 14:48
  • Yes that is what I meant, like making the entire opaque plot be on top of the other for all bars regardless of values. There should be an option for that right? Right now the opaque plot is behind the non-opaque plot, so I just want to reverse that overall order. – BoroBorooooooooooooooooooooooo Feb 17 '18 at 16:13
  • The reason you do not see any change in overlay with switching `ax` is regardless of order the solid plot will always render stronger than transparent. Test this by not using `alpha` but passing different solid colors (use *color* arg list in `plot`), then switch `ax`. [See named list here](https://stackoverflow.com/a/37232760/1422451). – Parfait Feb 17 '18 at 18:10