0

I'm loading data from the disk with TF.CsvDataset. And plotting the data as

#This is the transformation function applied on loaded data before displaying histogram.
def preprocess(*fields):
    print(len(fields))
    features=tf.stack(fields[:-1])
    labels=tf.stack([int(x) for x in fields[-1:]])
    return features,labels  # x, y

for features,label in train_ds.take(1000):
#  print(features[0])
plt.hist(features.numpy().flatten(), bins = 101)

And I'm getting this histogram

enter image description here

But I want to plot distribution of 712 features' values against binary class labels. That is, what is the value of feature 1,2 or 3 when class label is 0.

How to do that with pyplot?

I have read following threads but, nothing helped.

Plotting histograms against classes in pandas / matplotlib

Histogram color by class

How to draw an histogram with multiple categories in python

DevLoverUmar
  • 11,809
  • 11
  • 68
  • 98

1 Answers1

3

You can use np.fromiter and get all the labels. Then you simply pass the list of labels to plt.hist:

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

train, test = tf.keras.datasets.mnist.load_data()

ds = tf.data.Dataset.from_tensor_slices(train)

vals = np.fromiter(ds.map(lambda x, y: y), float)

plt.hist(vals)
plt.xticks(range(10))
plt.title('Label Frequency')
plt.show()

enter image description here

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
  • Thanks for your input! I'm sorry for inconvenience but I want distribution of data.. to be precise floating values of 712 features against two binary classes 0 and 1. – DevLoverUmar Jan 06 '21 at 15:44
  • Or as per the discussion on this answer, I think I'm using the wrong graph. https://stackoverflow.com/a/52060028/7344164 – DevLoverUmar Jan 06 '21 at 16:07
  • I think I want something like this https://matplotlib.org/gallery/lines_bars_and_markers/xcorr_acorr_demo.html#sphx-glr-gallery-lines-bars-and-markers-xcorr-acorr-demo-py – DevLoverUmar Jan 06 '21 at 16:17