1

I am plotting my pandas data using matplotlib, My plot looks like this:

enter image description here

There are four classes in the dataset. I want to color the backgroud area for each class, something like this

enter image description here

My matplotlib code looks like this:

import pandas as pd
df = pd.read_csv('normalized.csv')
fig    = plt.figure(figsize=(8,8))
plt.scatter(df['p1'], df['p2'], c= list(df['cs']), alpha=0.9)
plt.show()

I also tried sns for this:

import pandas as pd
df = pd.read_csv('normalized.csv')
sn.FacetGrid(df, hue="cs", size = 8).map(plt.scatter, "p1", "p2").add_legend()
plt.show()

How I can fill the backgroud area for four classes in any of module?

Aaditya Ura
  • 12,007
  • 7
  • 50
  • 88
  • 1
    Your first question would probably be how would one define `the backgroud area for each class` – Quang Hoang Sep 16 '20 at 21:40
  • You need to train a classifier to predict the color of the background. Then you can use [np.meshgrid](https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html) to create a mesh behind your scatter plot. [Here is an example](https://stackoverflow.com/questions/45075638/graph-k-nn-decision-boundaries-in-matplotlib) with a K-nearest-neighbor classifier. – Michael Szczesny Sep 16 '20 at 21:51

2 Answers2

3

A filled contour could serve as background:

import numpy as np
import matplotlib.pyplot as plt

N = 100
M = 4
points = np.random.normal(np.tile(np.random.uniform(1, 10, 2 * M), N)).reshape(-1, 2)
group = np.tile(np.arange(M), N)

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 5), sharey=True, sharex=True)
cmap = plt.cm.get_cmap('tab10', 4)
ax1.scatter(points[:, 0], points[:, 1], c=group, cmap=cmap)

ax2.scatter(points[:, 0], points[:, 1], c=group, cmap=cmap)
ax2.tricontourf(points[:, 0], points[:, 1], group, levels=np.arange(-0.5, 4), zorder=0, cmap=cmap, alpha=0.3)
plt.show()

example plot with tricontourf

Note that the contour plot also creates some narrow zones of inbetween values, because it only looks at numeric values and supposes that between a zone 0 and a zone 2 there must exist some small zone 1.

A bit more involved approach uses a nearest neighbor fit:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
from sklearn import neighbors

N = 100
M = 4
points = np.random.normal(np.tile(np.random.uniform(1, 10, 2 * M), N)).reshape(-1, 2)
groups = np.tile(np.arange(M), N)

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(14, 5), sharey=True, sharex=True)
cmap = ListedColormap(['orange', 'cyan', 'cornflowerblue', 'crimson'])
ax1.scatter(points[:, 0], points[:, 1], c=groups, cmap=cmap)

ax2.scatter(points[:, 0], points[:, 1], c=groups, cmap=cmap)

clf = neighbors.KNeighborsClassifier(10)
clf.fit(points, groups)

x_min, x_max = points[:, 0].min() - 1, points[:, 0].max() + 1
y_min, y_max = points[:, 1].min() - 1, points[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 50),
                     np.linspace(y_min, y_max, 50))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
ax2.imshow(Z, extent=[x_min, x_max, y_min, y_max], cmap=cmap, alpha=0.3, aspect='auto', origin='lower')
plt.show()

example with nearest neighbor

JohanC
  • 71,591
  • 8
  • 33
  • 66
1

If you don't need to fill the space and do not bother about areas overlap (your data points show some overlap) then you can try to fill out the convex hull defined by each subset.

import matplotlib.pyplot as plt
import numpy as np
from scipy.spatial import ConvexHull
N = 100
points = [np.random.normal(np.tile(np.random.uniform(1, 5, 2), N)).reshape(-1, 2) for i in range(4)]
colors = ['r', 'g', 'b', 'k']
for k in range(4):
    hull = ConvexHull(points[k])
    plt.plot(points[k][:,0], points[k][:,1], '.', color = colors[k])
    plt.fill(points[k][hull.vertices,0], points[k][hull.vertices,1], color = colors[k], alpha=0.3)

.stack.imgur.com/2562R.png

heracho
  • 590
  • 1
  • 9
  • 28