1

My experience with Python is pretty basic. I have written Python code to import data from an external file and perform a calculation. My result looks something like this (except much larger in reality).

1   1       
1   1957    
1   0.15        
2   346 
2   0.90
2   100
3   1920
3   100
3   40

What I want to do is plot these two columns as a single series, but then distinguish each data point according to a certain pattern. I know this sounds unnecessarily complicated, but it's something I need to do to help out the people who will use my code. Unfortunately, my Python skills fail me here. More specifically:

1. The first column has "1," "2," or "3." So first I want to make all the "1" data points circles (for example), all the "2" data points some other symbol, and likewise for the "3" data points.

2. Next. There are three rows for each distinct number. So for "1," the "0.15" in the second column is the average value, the "1957" is the maximum value, the "1" is the minimum value. I want to make the data point associated with each number's average value (the top row for each number) green (for example). I want the maximum and minimum values to have their own colors too.

So I will end up with a plot that shows one series only, but where each data point looks distinct. If anyone could please point me in the right direction, I would be very grateful. If I have not said this clearly, please let me know and I'll try again!

  • Does this [SO](https://stackoverflow.com/questions/14885895/color-by-column-values-in-matplotlib) address your problem? – Sameeresque May 20 '20 at 17:28
  • Thank you for that, Sameeresque. I don't think it does, but maybe I'm such a newbie that I'm not understanding something. I think the Python example at the link deals with what are already distinct series (like "Gender") instead of distinguishing points in a single series. –  May 20 '20 at 17:32
  • can you provide an example of what is your calculation result, the table above is not very intuitive – StupidWolf May 20 '20 at 17:42
  • I will do that, StupidWolf. Thanks for asking. Might take a little while... –  May 20 '20 at 17:51
  • @user3292696 Regarding your data, `min, mean, max = 1, 0.15, 1957`, how can the mean value be smaller than the min. value? – a_guest May 20 '20 at 18:13
  • It's just an example. I previously wrote my question in a more complicated way and simplified it fast. I'll edit. –  May 20 '20 at 18:18

2 Answers2

0

What I would do is to separate the data into three different columns so you have a few series. Then I'd use the plt.scatter with different markers to get the desired effect.

image

code

import matplotlib.pyplot as plt
import numpy as np

# Fixing random state for reproducibility
np.random.seed(19680801)


N = 100
r0 = 0.6
x = 0.9 * np.random.rand(N)
y = 0.9 * np.random.rand(N)
area = (20 * np.random.rand(N))**2  # 0 to 10 point radii
c = np.sqrt(area)
r = np.sqrt(x ** 2 + y ** 2)
area1 = np.ma.masked_where(r < r0, area)
area2 = np.ma.masked_where(r >= r0, area)
plt.scatter(x, y, s=area1, marker='^', c=c)
plt.scatter(x, y, s=area2, marker='o', c=c)
# Show the boundary between the regions:
theta = np.arange(0, np.pi / 2, 0.01)
plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))

plt.show()

source: https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/scatter_masked.html#sphx-glr-gallery-lines-bars-and-markers-scatter-masked-py

  • Thank you very much for that, Sritej. If I understand your code correctly though (and I may not--I'm not that experienced with Python), I would need to separate my series. I can't do that, unfortunately. Thank you, though! –  May 20 '20 at 18:17
  • one workaround that would keep it in a single series is to plot each point individually shown here: https://stackoverflow.com/questions/39692554/how-to-change-the-shape-of-the-marker-depending-on-a-column-variable – Sritej Attaluri May 20 '20 at 18:26
0

For different marker styles you currently need to create different plot instances (see this github issue). Using different colors can be done by passing an array as the color argument. So for example:

import matplotlib.pyplot as plt
import numpy as np

data = np.array([
    [1, 0.15],
    [1, 1957],
    [1, 1],
    [2, 346],
    [2, 0.90],
    [2, 100],
    [3, 1920],
    [3, 100],
    [3, 40],
])
x, y = np.transpose(data)

symbols = ['o', 's', 'D']
colors = ['blue', 'orange', 'green']

for value, marker in zip(np.unique(x), symbols):
    mask = (x == value)
    plt.scatter(x[mask], y[mask], marker=marker, color=colors)
plt.show()
a_guest
  • 34,165
  • 12
  • 64
  • 118