Is there a way to use conditionals on colors in a scatter plot in Python?

Question

So Im new in the field of data science, the thing is I have a dataset practice with so what Im trying to do is this:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

file = pd.read_csv('datasets/office_episodes.csv')

x = np.array(file.loc[:,'episode_number'])

y = np.array(file.loc[:, 'viewership_mil'])

scaled_ratings = np.array(file.loc[:, 'scaled_ratings'])

ratings2 = list(scaled_ratings)
   
plt.title("Popularity, Quality, and Guest Appearances on the Office")

plt.xlabel("Episode Number")

plt.ylabel("Viewership (Millions)")

for i in ratings2:
    if i < 0.25:
         plt.scatter(x, y, c='red')
    elif i >=0.25 and i < 0.50:
          plt.scatter(x, y, c='orange')   
    elif i >= 0.50 and i < 0.75:
        plt.scatter(x, y, c='lightgreen')
    elif i >= 0.75:
        plt.scatter(x, y, c='darkgreen')
    else:
        plt.scatter(x, y, c='pink')


plt.show()

As you can see in the for loop Im conditioning the colors of the dots in the scatter plot based on the scale ratings but when plot is displayed it looks like this:

I also tried to create a variable called ratings3 that contains ratings2, so in that way I could make a list comprehension so in that I could pass ratings3 in the color paramater of the plt.scatter() function.

Use pd.cut to create categories, then plot color based on category. — Scott Boston, Jul 03 '21 at 22:30

score 0 · Answer 1 · answered Jul 03 '21 at 22:34

I am not an expert at this, but here is my solution. You would first have to make separate arrays for each category. Then you can plot each with the chosen colors.

y1 = np.array(file.loc[file['scaled_ratings'] < 0.25, 'viewership_mil'])
y2 = np.array(file.loc[0.25 <= file['scaled_ratings'] < 0.5, 'viewership_mil'])
y3 = np.array(file.loc[0.5 <= file['scaled_ratings'] < 0.75, 'viewership_mil'])
y4 = np.array(file.loc[0.75 <= file['scaled_ratings'], 'viewership_mil'])

plt.scatter(x, y1, c='red')
plt.scatter(x, y2, c='orange')
plt.scatter(x, y3, c='lightgreen')
plt.scatter(x, y4, c='darkgreen')

score 0 · Accepted Answer · answered Jul 03 '21 at 23:33

Some sample data and imports:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

n = 175
np.random.seed(15)
df = pd.DataFrame({
    'episode_number': np.random.randint(0, 180, n),
    'viewership_mil': np.random.randint(2_500_000, 12_500_000, n) / 1_000_000
})
df['scaled_ratings'] = df['viewership_mil'] / df['viewership_mil'].sum() * 100

df.head():

   episode_number  viewership_mil  scaled_ratings
0             140       12.414172        0.925457
1             133        9.918293        0.739393
2             119        7.513288        0.560104
3             128       11.664907        0.869600
4             156        8.610445        0.641895

Create categories based on scaled_ratings using pd.cut:

colors = pd.cut(
    df['scaled_ratings'],
    bins=[np.NINF, 0.25, .5, .75, np.inf],
    labels=['red', 'orange', 'lightgreen', 'darkgreen'],
    right=False
)

colors.head():

0       darkgreen
1      lightgreen
2      lightgreen
3       darkgreen
4      lightgreen

Then plot scatter and specify c=:

fig, ax = plt.subplots()
ax.scatter(x=df['episode_number'], y=df['viewership_mil'], c=colors)
plt.title("Popularity, Quality, and Guest Appearances on the Office")
plt.xlabel("Episode Number")
plt.ylabel("Viewership (Millions)")
plt.show()

All together:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

n = 175
np.random.seed(15)
df = pd.DataFrame({
    'episode_number': np.random.randint(0, 180, n),
    'viewership_mil': np.random.randint(2_500_000, 12_500_000, n) / 1_000_000
})
df['scaled_ratings'] = df['viewership_mil'] / df['viewership_mil'].sum() * 100

# Assign Colors based on df['scaled_ratings']
colors = pd.cut(
    df['scaled_ratings'],
    bins=[np.NINF, 0.25, .5, .75, np.inf],
    labels=['red', 'orange', 'lightgreen', 'darkgreen'],
    right=False  # Lower-bound inclusive x >= .25 and x < .5
)
# Plot
fig, ax = plt.subplots()
ax.scatter(x=df['episode_number'], y=df['viewership_mil'], c=colors)
plt.title("Popularity, Quality, and Guest Appearances on the Office")
plt.xlabel("Episode Number")
plt.ylabel("Viewership (Millions)")
plt.show()

Is there a way to use conditionals on colors in a scatter plot in Python?

2 Answers2