2

I have a text file including 8 features and 1 class. The data of my file is (data.txt):

1,1,3,2,1,1,1,3,HIGH
1,1,3,1,2,1,1,3,HIGH
1,1,1,1,3,3,1,2,HIGH
1,3,2,1,3,3,3,3,HIGH
1,3,1,2,3,1,2,1,HIGH
2,3,1,2,1,2,2,1,HIGH
2,2,2,2,2,1,2,3,HIGH
2,2,1,1,1,2,2,3,HIGH
3,2,1,3,1,3,3,3,HIGH
3,2,1,2,2,3,3,2,HIGH

In the above file, the first 8 columns are the features. They are tagged with a number that could be 1 or 2 or 3. The last column is the class name (HIGH). Now I want to plot these features based on the tag numbers. I can do it for 3 first column by this code:

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_csv('data.txt', header=None)

# Features are : A,B,C,...,H
df.columns = ['A', 'B','C', 'D', 'E', 'F', 'G', 'H', 'class']

X = df.ix[:, 0:8].values
y = df.ix[:, 8].values

kind = ['barstacked']
deg = ['HIGH']
pos = ['left','right','mid']
col = ['r','b','y']

with plt.style.context('seaborn-whitegrid'):
    plt.figure(figsize=(8, 6))

    for j in range(0,3):
        for i in range(1):
                plt.hist(X[y == deg[i], j],
                     label=deg[i],
                     bins=30,
                     alpha=0.6, histtype=kind[i], align=pos[j], color=col[j])

    plt.tick_params(axis='both', which='major', labelsize=17)
    plt.xlim(0.75, 3.25)
    plt.tight_layout()
    plt.savefig("figure.png" , format='png', dpi=700)
    plt.show()

Here is the result: enter image description here

However I could not plot the other 5 columns because I did not know how to put them next to each other as there are only 3 align options (left, mid and right). What I am looking at is a histogram plot for all 8 features that separates the features based on the tag number. A graph like this:

enter image description here

Sheldore
  • 37,862
  • 7
  • 57
  • 71
Leo
  • 479
  • 2
  • 6
  • 16

1 Answers1

1

You don't need a histogram here and you can easily generate the required figure using a bar chart because you are just plotting a single frequency here. The idea is as follows:

  • Use Counter module from collections to get the frequency of 1, 2, and 3.
  • The x-positions for your bar chart will be centered around 1, 2 and 3. However, to have the desired effect, you can tweak the x-positions by offsetting them: first 4 bars to the left of 1, 2, 3 and the next 4 bars to the right of 1, 2, 3. This can be done using an offset parameter (j-4)*0.1 which you add to the x-values. Here 0.1 serves as a nice choice of bar width.
  • You don't need an additional loop over i here since it is always 0
  • df.ix is deprecated in the newer pandas versions. You will have to use df.iloc instead.

Following is how you can do it.

df.columns = ['A', 'B','C', 'D', 'E', 'F', 'G', 'H', 'class']

X = df.ix[:, 0:8].values
y = df.ix[:, 8].values

with plt.style.context('seaborn-whitegrid'):
    plt.figure(figsize=(8, 6))
    for j in range(0,8):
            freqs = Counter(X[y == deg[0], j])
            xvalues = np.array(list(freqs.keys()))
            plt.bar(xvalues+(j-4)*0.1, freqs.values(), width=0.1, 
                    alpha=0.9, edgecolor='k', lw=2)
    plt.tick_params(axis='both', which='major', labelsize=17)
    plt.xlim(0.25, 3.75)
    plt.xticks([1,2,3])
    plt.tight_layout()
    plt.show()

enter image description here

Sheldore
  • 37,862
  • 7
  • 57
  • 71
  • Thank you. It was exactly what I was looking for. Is there anyway that I could put the number corresponding to the height of each bar at the top of the corresponding bar? For example I want to see number 5 at the top of the very left blue bar. The same thing for other 23 bars. – Leo Mar 20 '19 at 02:07
  • @Leo: It is possible. There are already several Stack Overflow answers on how to do this. I would recommend you to read them. For example: [this](https://stackoverflow.com/questions/28931224/adding-value-labels-on-a-matplotlib-bar-chart), [this](https://stackoverflow.com/questions/44049999/matplotlib-way-to-annotate-bar-plots-with-lines-and-figures), and [official example](https://matplotlib.org/examples/api/barchart_demo.html). Try them out and if it doesn't work, I would recommend to post another fresh question with specific problem. – Sheldore Mar 20 '19 at 02:09
  • Sure I will check them out. Thanks again for your help. – Leo Mar 20 '19 at 02:14