1

l would like to plot histograms of three vectors sharing the same x-axis which is classes. Classes (101 string values) is a set of labels

classes={'playingguitar', 'billiards', 'boxingboxingspeedbag', 'applylipstick', 'playingsitar', 'fieldhockeypenalty', 'blowingcandles', 'longjump', 'playingdhol', 'biking', 'playingpiano', 'handstandwalking', 'playingcello', 'ropeclimbing', 'hulahoop', 'cricketshot', 'punch', 'pushups', 'floorgymnastics', 'jumpingjack', 'lunges', 'golfswing', 'bandmarching', 'skiing', 'playingtabla', 'archery', 'breaststroke', 'unevenbars', 'playingviolin', 'babycrawling', 'moppingfloor', 'bowling', 'knitting', 'rockclimbingindoor', 'shavingbeard', 'writingonboard', 'shotput', 'stillrings', 'drumming', 'applyeyemakeup', 'cuttinginkitchen', 'pizzatossing', 'soccerpenalty', 'bodyweightsquats', 'taichi', 'benchpress', 'trampolinejumping', 'playingdaf', 'pullups', 'pommelhorse', 'jumprope', 'headmassage', 'horserace', 'skijet', 'surfing', 'basketballdunk', 'polevault', 'brushingteeth', 'salsaspin', 'frontcrawl', 'horseriding', 'typing', 'throwdiscus', 'nunchucks', 'diving', 'balancebeam', 'highjump', 'volleyballspiking', 'icedancing', 'cricketbowling', 'rafting', 'yoyo', 'walkingwithdog', 'swing', 'hammering', 'mixing', 'wallpushups', 'parallelbars', 'skateboarding', 'skydiving', 'jugglingballs', 'soccerjuggling', 'kayaking', 'cleanandjerk', 'tennisswing', 'playingflute', 'javelinthrow', 'haircut', 'blowdryhair', 'cliffdiving', 'frisbeecatch', 'boxingspeedbag', 'handstandpushups', 'militaryparade', 'hammerthrow', 'rowing', 'basketball', 'baseballpitch', 'tabletennisshot', 'fencing', 'sumowrestling'}
len(classes)=101

In labels2, labels2_test, labels2_full_train we have the number of occurence of each class in different order :

from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
import pylab
labels2, values2 = zip(*Counter(train2).items())
labels2_test, values_test2 = zip(*Counter(test).items())
labels2_full_train, values2_full_train = zip(*Counter(full_train).items())

l would like to make a plot such that x-axis represents classes and y-axis number of occurrence of each class in values2, values_test2, values2_full_train

What l have tried ?

pylab.rcParams['figure.figsize'] = (30, 10)
fig1, ax1 = plt.subplots()
ax1.tick_params(rotation=90)
ax1.plot(labels2, values2, label='train classes')
ax1.plot(labels2_test, values_test2, label='test classes')
ax1.plot(labels2_full_train,  values2_full_train, label='test classes')
ax1.set_xlabel("classes",rotation='vertical')
ax1.set_ylabel("number of examples")
ax1.set_title("data distibution")
ax1.legend(loc='best')
fig1.show()

However l get something as follow : enter image description here

since labels2, labels2_test, labels2_full_train are not ordered in the same way in

labels2, values2 = zip(*Counter(train2).items())
labels2_test, values_test2 = zip(*Counter(test).items())
labels2_full_train, values2_full_train = zip(*Counter(full_train).items())

So how can l get labels2, labels2_test, labels2_full_train in the same order (for instance as defined in classes) ?

For instance

labels2=['rafting', 'punch', 'applyeyemakeup',...]
values2=[78, 112, 106,...]
labels2_test=['typing', 'surfing', 'cricketbowling',..]
values_test2=[46, 38, 39,...]
labels2_full_train=['archery', 'benchpress', 'brushingteeth',...]
values2_full_train=[1046, 1043, 1065,...]

thank you

Joseph
  • 343
  • 6
  • 18
  • As commented below your last question: The order of the labels is always the same. It is alphabetical. If you want to have a line through the points which follows this order, you need to sort the input data by the string values previous to plotting. – ImportanceOfBeingErnest Feb 09 '18 at 12:26
  • Do you need help with sorting one list on the items of another one? In how far did other questions on sorting lists here on SO not help you? – ImportanceOfBeingErnest Feb 09 '18 at 12:29
  • l tried for example labels2, values2 = sorted(zip(*Counter(train2).items())) but it's not accepted . TypeError: '<' not supported between instances of 'int' and 'str' – Joseph Feb 09 '18 at 12:32
  • @ImportanceOfBeingErnest yes l need help to sort one list on the items of another one – Joseph Feb 09 '18 at 12:33
  • @ImportanceOfBeingErnest, l just checked , the string are not ordered alphabetically. For instance archery class is supposed to be at the beginning however l find it in the middle of the list for labels2 and in the last for labels2_test. – Joseph Feb 09 '18 at 12:40
  • What I meant to say is "The order of the **tick**labels **in the plot** is always the same. It is alphabetical.", sorry I did not think about you calling your lists "label" as well. – ImportanceOfBeingErnest Feb 09 '18 at 12:42

1 Answers1

2

The problem

Because matplotlib shows categorical variables sorted on the axes, you need to sort the lists alphabetically before plotting. So let's create a complete and verifiable example:

from collections import Counter

list1 = list("ABADEAABEDDAEDBBBBBD") # letters A, B, D, E
list2 = list("AABAEEDCCFFFEDABEEC")  # all letters A-F

items1, counts1 = zip(*sorted(Counter(list1).items()))
items2, counts2 = zip(*sorted(Counter(list2).items()))


import matplotlib.pyplot as plt
plt.plot(items1, counts1, label="list1")
plt.plot(items2, counts2, label="list2")
plt.legend()
plt.show()

Note that the first list contains a subset of the all possible items. The output looks like this:

enter image description here

So unfortunately, although the lists themselves are sorted, the plot shows some strange behaviour in that the axes shows C and F at the end.

The solution

The solution to this would be to let the axis know about all possible items to plot beforehands. We could e.g. plot an invisible plot of all items to the axes,

import matplotlib.pyplot as plt

plt.plot(items1+items2, [5]*len(items1+items2), visible=False)

plt.plot(items1, counts1, "o-", label="list1")
plt.plot(items2, counts2, "o-", label="list2")
plt.legend()
plt.show()

enter image description here

Community
  • 1
  • 1
ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Thank you for your answer. sorry for this question l didn't mention it. l would to get histograms for that data and not a plot as did it here https://stackoverflow.com/questions/6871201/plot-two-histograms-at-the-same-time-with-matplotlib but the problem of bins persists – Joseph Feb 09 '18 at 14:08
  • You mean replace `plot` with `bar`? – ImportanceOfBeingErnest Feb 09 '18 at 14:13
  • in you example it will be a bins of A B C D E F for list1 and bins of A B C D E F for list2 – Joseph Feb 09 '18 at 14:28