0

here is the my sample data set ,it is similar to my original data set

Country,state,sex,dist,population,LL,UL
IND,AP,F,EG,82,80,150
IND,AP,F,WG,150,80,150
IND,AP,F,VZA,95,80,150
IND,AP,F,BZA,116,80,150
IND,AP,M,EG,180,80,150
IND,AP,M,WG,158,80,150
IND,AP,M,VZA,77,80,150
IND,AP,M,BZA,114,80,150
IND,UP,F,A,86,80,150
IND,UP,F,B,179,80,150
IND,UP,M,C,83,80,150
IND,UP,M,D,146,80,150

i want to create the similar plot as below plot(which is created using excel) but i want to create it in python.
enter image description here
please help to create below plot using python, tried uisng matplotlib but could not replicate like below chart, mainly cannot pass column names to X-axis like below.
Thanks in advance

jagan k
  • 33
  • 7

2 Answers2

2

Unfortunately there is no simple solution to add hierarchical axis labels with matplotlib. Below I have created one solution adapted from the code here. 'chart_data.csv' refers to the example data you provided.

demo plot

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from itertools import groupby

# add divider lines
def add_line(ax, xpos, ypos):
    line = plt.Line2D([xpos, xpos], [ypos + .1, ypos],
                      transform=ax.transAxes, color='black')
    line.set_clip_on(False)
    ax.add_line(line)

# Get counts for each label for this level
def label_len(my_index,level):
    labels = my_index.get_level_values(level)
    #eg '[('AP', 8), ('UP', 4)]'
    return [(k, sum(1 for i in g)) for k,g in groupby(labels)]

# add divider lines and labels to plot
def add_xaxis_group_labels(ax, df):
    ypos = -.1
    scale = 1./df.index.size
    for level in range(df.index.nlevels)[::-1]:
        pos = 0
        for label, rpos in label_len(df.index,level):
            lxpos = (pos + .5 * rpos)*scale
            ax.text(lxpos, ypos, label, ha='center', transform=ax.transAxes)
            add_line(ax, pos*scale, ypos)
            pos += rpos
        add_line(ax, pos*scale , ypos)
        ypos -= .1

#define data
df = pd.read_csv('chart_data.csv')
df = df.set_index(['Country','state','sex','dist'])

# instantiate figure/axes
fig = plt.figure()
ax = fig.add_subplot(111)
df.plot(kind='line',ax=fig.gca())

# remove default labels
labels = ['' for _ in ax.get_xticklabels()]
ax.set_xticklabels(labels)
ax.set_xlabel('')

#add new labels
add_xaxis_group_labels(ax, df)

#beautify
fig.subplots_adjust(bottom=.1*df.index.nlevels)

#move legend outside
ax.legend(bbox_to_anchor=(1.1, 1.05))

plt.show()

If you want to keep things simple and you don't mind it being a bit messy, this is the easiest way to add all the labels:

simple version

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('chart_data.csv')
df = df.set_index(['Country','state','sex','dist'])

df.plot(rot=90)

I would suggest however that as this is categorical data, the population data should be displayed as a bar chart, and the upper and lower limits as "threshold" lines. You can do this by changing kind='line' to kind='bar', and adding two threshold lines using plt.axhline(y=80, color='red') before plotting. You would also have to set df to only the population data.

enter image description here

Daniel Redgate
  • 219
  • 1
  • 6
0

you can simply do this using pandas data frame itself

df.plot()

(but x-axis label is diffrent) enter image description here