2

I have a dataset with about 9800 entries. One column contains user names (about 60 individual user names). I want to generate a scatter plot in matplotlib and assign different colors to different users.

This is basically what I do:

import matplotlib.pyplot as plt
import pandas as pd

x = [5, 10, 20, 30, 5, 10, 20, 30, 5, 10, 20, 30]
y = [100, 100, 200, 200, 300, 300, 400, 400, 500, 500, 600, 600]
users =['mark', 'mark', 'mark', 'rachel', 'rachel', 'rachel', 'jeff', 'jeff', 'jeff', 'lauren', 'lauren', 'lauren']

#this is how the dataframe basicaly looks like    
df = pd.DataFrame(dict(x=x, y=y, users=users)

#I go on an append the df with colors manually
#I'll just do it the easy albeit slow way here

colors =['red', 'red', 'red', 'green', 'green', 'green', 'blue', 'blue', 'blue', 'yellow', 'yellow', 'yellow']

#this is the dataframe I use for plotting
df1 = pd.DataFrame(dict(x=x, y=y, users=users, colors=colors)

plt.scatter(df1.x, df1.y, c=df1.colors, alpha=0.5)
plt.show()

However, I don't want to assign colors to the users manually. I have to do this many times in the coming weeks and the users are going to be different every time.

I have two questions:

(1) Is there a way to assign colors automatically to the individual users? (2) If so, is there a way to assign a color scheme or palette?

Rachel
  • 1,937
  • 7
  • 31
  • 58
  • Possible duplicate of [Scatter plots in Pandas/Pyplot: How to plot by category](http://stackoverflow.com/questions/21654635/scatter-plots-in-pandas-pyplot-how-to-plot-by-category) – tmdavison Jan 04 '17 at 14:44
  • @tom I don't think so. I need a way to assign a color column to the data frame dynamically. The question you suggest relates to grouped plots and not the color. – Rachel Jan 04 '17 at 14:47

1 Answers1

3
user_colors = {}
unique_users = list(set(users)) 
step_size = (256**3) // len(unique_users)
for i, user in enumerate(unique_users):
    user_colors[user] = '#{}'.format(hex(step_size * i)[2:])

Then you've got a dictionary (user_colors) where each user got one unique color.

colors = [user_colors[user] for user in users]

Now you've got your array with a distinct color for each user

  • Thank you! I think I understand what you do. However, can I apply it to a pandas data frame as well? How would that work? – Rachel Jan 04 '17 at 13:52