I have a dataset:
a b c d
10-Apr-86 Jimmy 1 Silly.doc
11-Apr-86 Minnie 2 Lala.doc
12-Apr-86 Jimmy 3 Goofy.doc
13-Apr-86 Minnie 4 Hilarious.doc
14-Apr-86 Jimmy 5 Joyous.doc
15-Apr-86 Eliot 6 Crackingup.doc
16-Apr-86 Jimmy 7 Funny.doc
17-Apr-86 Eliot 8 Happy.doc
18-Apr-86 Minnie 9 Mirthful.doc
Using the following code in python 2.7.12..
df = (pd.read_csv('python.csv'))
df_wanted = pd.pivot_table(
df,
index='a',
columns='b',
values='c')
df_wanted.index = pd.to_datetime(df_wanted.index)
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(df_wanted.index, df_wanted['Jimmy'], s=50, c='b', marker="s")
ax1.scatter(df_wanted.index,df_wanted['Minnie'], s=50, c='r', marker="o")
ax1.scatter(df_wanted.index,df_wanted['Eliot'], s=50, c='g', marker="8")
plt.legend(loc='upper left');
for k, v in df.set_index('a').iterrows():
plt.text(k, v['c'], v['d'])
plt.show()
.. I can create the following visualization in matplotlib:
The problem: this is only a toy dataset. When I apply this code to my real dataset, which has more than 3000 points, all the data labels blend together in a black illegible block.
I would like to avoid this problem by using the code here to make the data labels appear when they are clicked.
The issue I'm having is with this part of the above-mentioned code,
x=[1,2,3,4,5]
y=[6,7,8,9,10]
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
scat = ax.scatter(x, y)
DataCursor(scat, x, y)
plt.show()
Obviously, I need to replace the "x" and "y" with my pivot table columns, but I don't know how to make scat = ax.scatter(x, y)
or DataCursor(scat, x, y)
work with my data.
I tried the following
scat = ax1.scatter(df_wanted.index, df_wanted['Minnie'], s=50, c='b', marker="s")
scat1 = ax1.scatter(df_wanted.index,df_wanted['Jimmy'], s=50, c='r', marker="o")
scat2 = ax1.scatter(df_wanted.index,df_wanted['Eliot'], s=50, c='g', marker="8")
DataCursor(scat,df_wanted.index,df_wanted['Minnie'])
DataCursor(scat1,df_wanted.index,df_wanted['Jimmy'])
DataCursor(scat2,df_wanted.index,df_wanted['Eliot'])
plt.show()
But I get this error TypeError: Invalid Type Promotion
UPDATE: I used the code from here to get the doc name in the console:
from matplotlib.pyplot import figure, show
import numpy as npy
from numpy.random import rand
import pandas as pd
df = (pd.read_csv('python.csv'))
df_wanted = pd.pivot_table(
df,
index='a',
columns='b',
values='c')
df_wanted.index = pd.to_datetime(df_wanted.index)
if 1: # picking on a scatter plot (matplotlib.collections.RegularPolyCollection)
c = 'r'
c1 = 'b'
c2 = 'g'
s = 85
y = df_wanted['Minnie']
z = df_wanted['Jimmy']
f = df_wanted['Eliot']
x = df_wanted.index
def onpick3(event):
ind = event.ind
print npy.take(df['d'], ind)
fig = figure()
ax1 = fig.add_subplot(111)
col = ax1.scatter(x, y, s, c, picker=True)
ax2 = fig.add_subplot(111)
col = ax1.scatter(x, z, s, c1, picker=True)
ax3 = fig.add_subplot(111)
col = ax1.scatter(x, f, s, c2, picker=True)
plt.legend(loc='upper left')
#fig.savefig('pscoll.eps')
fig.canvas.mpl_connect('pick_event', onpick3)
show()
The problem now is that the document name being returned is not accurate. I think the problem is that the ind number is for each individual series. I need a way to combine all the series, and assign an ind number to their total.