0

Python 2.7 Need your help with creating a 2D scatterplot from a Numpy array of 3 dimension where Col0 is used for Group Tag and Col 1 and 2 for the coordinates (X, Y).

Numpy array looks like below

array([['A', '4.83186388889', '2.34534666667'],
   ['A', '4.87818611111', '2.80832888889'],
   ['A', '4.82518611111', '2.33834222222'],
   ['B', '4.53763888889', '-11.88424'],
   ['B', '4.503125', '-11.9406266667'],
   ['B', '4.45975555556', '-11.9688044444'],
   ['C', '6.12376666667', '-9.61480888889'],
   ['C', '6.20991666667', '-9.66523111111'],
   ['C', '6.12281388889', '-9.61702222222'],
   ['D', '6.46020833333', '-11.9756488889'],
   ['D', '6.43584166667', '-11.8586622222'],
   ['D', '6.43401111111', '3.88036888889'],
   ....
   dtype='|S21')

Dictionary cannot be used as it stores unique keys (groups) and I do not have an idea how to convert it into Pandas DataFrame with a proper format.

Tried like below previously and even though it was printed OK it did not work for the chart.

dataset = pd.DataFrame(**array**, columns = ['Description','X','Y'])
dataset[['X','Y']] = dataset[['X','Y']].apply(pd.to_numeric)

I'd like to create a 2D scatterplot for all my group tag's (A, B, C, ...) - of multiple sets of coordinates (x,y) - separate color per group (A, B, C, ...)

Looking forward to your help.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
wounky
  • 97
  • 1
  • 12
  • Did you even searched how to do it in the first place ? – LoneWanderer May 04 '19 at 22:19
  • Searched, did not find corresonding exaple. – wounky May 04 '19 at 22:22
  • You could also use seaborn, which would resolve your color issue easily. https://seaborn.pydata.org/ or https://python-graph-gallery.com/scatter-plot/ the latter almost exactly cover your usecase including matplotlib explanations – LoneWanderer May 04 '19 at 22:33

1 Answers1

0

You don't need pandas for plotting, just matplotlib. You can iterate over the array and pass each XY coordinate to plt.scatter. You could even use a structure (like a dictionary) where you define a specific color for each group:

import matplotlib.pyplot as plt

colors = {'A': 'red', 
          'B': 'blue',
          'C': 'green',
          'D': 'black'}    
for group, x, y in array:
    plt.scatter(float(x), float(y), color=colors[group])
plt.show()

Edit: use this instead in order to dinamically create random colors for each group, no matter how many:

from random import random
import matplotlib.pyplot as plt

colors = {}    
for group, x, y in array:
    plt.scatter(float(x), float(y), color=colors.setdefault(group, (random(), random(), random())))
plt.show()
jfaccioni
  • 7,099
  • 1
  • 9
  • 25
  • Ok, thinking as well how to include a legend for the groups on the chart, could you please tell if it should work automatically? – wounky May 04 '19 at 22:17
  • Matplotlib tries to do this automatically. You *could* add the keyword argument `label=group` to the `plt.scatter` call, and then call `plt.legend()` once before calling `plt.show()`. But the issue here is that duplicate legends will be created (one for each individual scatter point). Refer to [this question](https://stackoverflow.com/questions/13588920/stop-matplotlib-repeating-labels-in-legend) in order to avoid this. – jfaccioni May 04 '19 at 22:21