0

I've made a dataframe that has dates and 2 values that looks like:

Date          Year        Level        Price
2008-01-01    2008        56           11
2008-01-03    2008        10           12
2008-01-05    2008        52           13
2008-02-01    2008        66           14
2008-05-01    2008        20           10
..
2009-01-01    2009        12           11
2009-02-01    2009        70           11
2009-02-05    2009        56           12
..
2018-01-01    2018        56           10
2018-01-11    2018        10           17
..

I'm able to plot these by colors on their year by creating a column on their years with df['Year'] = df['Date'].dt.year but I want to also have labels on each Year in the legend.

My code right now for plotting by year looks like:

colors = ['turquoise','orange','red','mediumblue', 'orchid', 'limegreen']

fig = plt.figure(figsize=(15,10))
ax = fig.add_subplot(111)

ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors))
plt.title('Title', fontsize=16)
plt.ylabel('Level', fontsize=14)
plt.xlabel('Price', fontsize=14)
plt.legend(loc='upper left', prop={'size': 12});
plt.show()

How can I adjust the labels in the legend to show the year? The way I've done it is just using the Year column but that obviously just gives me results like this:

enter image description here

HelloToEarth
  • 2,027
  • 3
  • 22
  • 48

1 Answers1

0

When you are scattering your points, you will want to make sure that you are accessing a col in your dataframe that exists. In your code, you are trying to access a column called 'Year' which doesn't exist. See below for the problem:

ax.scatter(df['Price'], df['Level'], s=10, c=df['Year'], marker="o", label=df['Year'], cmap=matplotlib.colors.ListedColormap(colors)

In this line of code, where you specify the color (c) you are looking for a column that doesn't exist. As well, you have the same problem with your label that you are passing in. To solve this you need to create a column that contains the year:

  1. Extract all the dates
  2. Grab just the year from each date
  3. Add this to your dataframe

Below is some code to implement these steps:

# Create a list of all the dates
dates = df.Date.values

#Create a list of all of the years using list comprehension
years = [x[0] for x in dates.split('-')]

# Add this column to your dataframe
df['Year'] = years

As well I would direct you to this course to learn more about plotting in python! https://exlskills.com/learn-en/courses/python-data-modeling-intro-for-machine-learning-python_modeling_for_machine_learning/content

  • I actually did already do this (but did not show the Year column in the original post). The problem is not getting the years but in labeling the legend by the year. – HelloToEarth Nov 26 '18 at 18:02
  • Hmmm, I understand your predicament, and I am surprised that it still isn't working. I will look into it and see if I can find a solution. – Elliott Saslow Nov 26 '18 at 20:03