1

I am running Python 3.6 with Pandas version 0.19.2. On the code example below, I have two questions regarding the Pandas plotting function scatter_matrix():

**1.**How can I colour-label the observations in the scatter plots with respect to the Label column?

**2.**How can I specify the number of bins for the histograms on the diagonal? Can I do this individually or just one bin number for all?

import pandas as pd
import numpy as np

N= 1000
df_feat = pd.DataFrame(np.random.randn(N, 4), columns=['A','B','C','D'])
df_label = pd.DataFrame(np.random.choice([0,1], N), columns=['Label'])
df = pd.concat([df_feat, df_label], axis=1)
axes = pd.tools.plotting.scatter_matrix(df, alpha=0.2)

This is linked to this more general one.

Zhubarb
  • 11,432
  • 18
  • 75
  • 114

1 Answers1

4

To answer your first question, there may be a less 'kludgey' way, but

scatter_matrix(df,c=['r' if i == 1 else 'b' for i in df['Label']])

To answer the second:

The scatter matrix can use the pd.hist() api to use hist keywords passed in a dictionary

scatter_matrix(df,hist_kwds={'bins':5})

G. Anderson
  • 5,815
  • 2
  • 14
  • 21