-2

I have a dataset having frequencies of observations. I am doing exploratory data analysis. Following is the sample.

x1    x2    x3    x4    label
15    10    1     2      0
3     2     15    10     1
0    1      10    11     1
9    7      1     1      0

I want to plot a single plot using python that uses x1,x2....xn at x-axis and frequencies for every record at y axis but color codes the plot based on labels i.e blue for label 0 and red for label 1. Objective is to visualize if there is a relation between class label and values of variables. How to do that in python? Something like this. enter image description here

But it uses one variable on x-axis and another on y-axis. I want to use all variables on x-axis and their frequencies on Y-axis.

Haroon S.
  • 2,533
  • 6
  • 20
  • 39
  • the questions whether you would use a bar-chart or scatter plot should be based on what you want to visualize (your data, question) and not on whether it is possible to color it in a certain way. Perhaps ask about the measure/method to determine relations on the statistics/data analysis stackexchange: stats.stackexchange.com – Dominique Fuchs Dec 01 '17 at 13:17
  • I want something like this. https://plot.ly/~RPlotBot/4336/petallength-vs-sepallength.png But it has one variable on x-axis, another on y-axis. I want to use all 4 variables in single plot with different colors based on label value. – Haroon S. Dec 01 '17 at 14:00
  • 1
    Great, so you already have an input, and expected output. What is missing now is a clear problem. What hinders you in just plotting your data and colorizing the points. Look at other questions here, clearly state in how far they are not helping. – ImportanceOfBeingErnest Dec 01 '17 at 14:15

1 Answers1

0

I am sorry if the question sounded to vague and naive. I am new to Python and data sciences. Following code gave the required output:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
np.random.seed(123)

# Generate Data
nbr_dim = 10
y = np.random.random((nbr_dim))
x = [1,2,3,4,5,6,7,8,9,10]
labels = np.random.choice([0, 1], nbr_dim)
df = pd.DataFrame(dict(x=x, y=y, label=labels))

groups = df.groupby('label')

# Plot
fig, ax = plt.subplots()
ax.margins(0.05)
for name, group in groups:
    ax.plot(group.x, group.y, marker='o', linestyle='', ms=12, label=name)
ax.legend()

plt.show()

Original answer given by Joe Kingston.

halfer
  • 19,824
  • 17
  • 99
  • 186
Haroon S.
  • 2,533
  • 6
  • 20
  • 39