So Im new in the field of data science, the thing is I have a dataset practice with so what Im trying to do is this:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
file = pd.read_csv('datasets/office_episodes.csv')
x = np.array(file.loc[:,'episode_number'])
y = np.array(file.loc[:, 'viewership_mil'])
scaled_ratings = np.array(file.loc[:, 'scaled_ratings'])
ratings2 = list(scaled_ratings)
plt.title("Popularity, Quality, and Guest Appearances on the Office")
plt.xlabel("Episode Number")
plt.ylabel("Viewership (Millions)")
for i in ratings2:
if i < 0.25:
plt.scatter(x, y, c='red')
elif i >=0.25 and i < 0.50:
plt.scatter(x, y, c='orange')
elif i >= 0.50 and i < 0.75:
plt.scatter(x, y, c='lightgreen')
elif i >= 0.75:
plt.scatter(x, y, c='darkgreen')
else:
plt.scatter(x, y, c='pink')
plt.show()
As you can see in the for loop Im conditioning the colors of the dots in the scatter plot based on the scale ratings but when plot is displayed it looks like this:
I also tried to create a variable called ratings3 that contains ratings2, so in that way I could make a list comprehension so in that I could pass ratings3 in the color paramater of the plt.scatter() function.