2

I am working on project to find similarity between two sentences/documents using tf-idf measure.

Now my question is how can I show the similarity in a graphical/Visualization format. Something like a Venn diagram where intersection value becomes the similarity measure or any other plots available in matplotlib or any python libraries.

I tried the following code:

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity  

documents = (
"The sky is blue",
"The sun is bright"

)
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform(documents)
print tfidf_matrix
cosine = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix)
print cosine
import matplotlib.pyplot as plt
r=25
d1 = 2 * r * (1 - cosine[0][0])
circle1=plt.Circle((0,0),d1/2,color='r')
d2 = 2 * r * (1 - cosine[0][1])
circle2=plt.Circle((r,0),d2/2,color="b")
fig = plt.gcf()
fig.gca().add_artist(circle1)
fig.gca().add_artist(circle2)
fig.savefig('plotcircles.png')
plt.show()

But the plot I got was empty. Can some one explain what might be the error.

plotting circle source:plot a circle

Community
  • 1
  • 1
Coder 477
  • 435
  • 3
  • 6
  • 16
  • 1
    If you look at the axes of the figure, and then print the values for d1, d2 and r, you'll quickly notice that the first circle has a diameter of 0 (at least, when I ran this code), and the second one falls completely outside the graph borders. –  Dec 23 '14 at 12:25
  • Adding the following just before `savefig` can fix this, though I guess there are better ways (and, of course, this won't show the circle with 0 radius anyway): `fig.axes[0].axis([min(-d1/2, r-d2/2), max(d1, r+d2/2), min(-d1/2, -d2/2), max(d1/2, d2/2)])`. –  Dec 23 '14 at 12:30

2 Answers2

4

Just to explain what's going on, here's a stand-alone example of your problem (if the circle is entirely outside the boundaries, nothing would be shown):

import matplotlib.pyplot as plt
from matplotlib.patches import Circle

fig, ax = plt.subplots()
circ = Circle((1, 1), 0.5)
ax.add_artist(circ)
plt.show()

enter image description here

When you manually add an artist through add_artist, add_patch, etc, autoscaling isn't applied unless you explicitly do so. You're accessing a lower-level interface of matplotlib that's what the higher-level functions (e.g. plot) are built on top of. However, this is also the easiest way to add a single circle in data coordinates, so the lower-level interface is what you want in this case.

Furthermore, add_artist is too general for this. You actually want add_patch (plt.Circle is matplotlib.patches.Circle). The difference between add_artist and add_patch may seem arbitrary, but add_patch has extra logic to calculate the extent of a patch for autoscaling, whereas add_artist is the "bare" lower-level function that can take any artist, but doesn't do anything special. Autoscaling won't work correctly for a patch if you add it with add_artist.

To autoscale the plot based on the artists that you've added, call ax.autoscale():

As a quick example of autoscaling a manually added patch:

import matplotlib.pyplot as plt
from matplotlib.patches import Circle

fig, ax = plt.subplots()
circ = Circle((1, 1), 0.5)
ax.add_patch(circ)
ax.autoscale()
plt.show()

enter image description here

Your next question might be "why isn't the circle round?". It is, in data coordinates. However, the x and y scales of the plot (this is the aspect ratio, in matplotlib terminology) are currently different. To force them to be the same, call ax.axis('equal') or ax.axis('scaled'). (We can actually leave out the call to autoscale in this case, as ax.axis('scaled'/'equal') will effectively call it for us.):

import matplotlib.pyplot as plt
from matplotlib.patches import Circle

fig, ax = plt.subplots()
circ = Circle((1, 1), 0.5)
ax.add_patch(circ)
ax.axis('scaled')
plt.show()

enter image description here

Joe Kington
  • 275,208
  • 71
  • 604
  • 463
1

The Plots are not empty, but I guess, your circles are to big!

I don't have sklearn installed, so I start at the point where you print cosine:

## set constants
r = 1
d = 2 * r * (1 - cosine[0][1])

## draw circles
circle1=plt.Circle((0, 0), r, alpha=.5)
circle2=plt.Circle((d, 0), r, alpha=.5)
## set axis limits
plt.ylim([-1.1, 1.1])
plt.xlim([-1.1, 1.1 + d])
fig = plt.gcf()
fig.gca().add_artist(circle1)
fig.gca().add_artist(circle2)
## hide axes if you like
# fig.gca().get_xaxis().set_visible(False)
# fig.gca().get_yaxis().set_visible(False)
fig.savefig('venn_diagramm.png')

That also answers your other question, where I also added this piece of code!

jkalden
  • 1,548
  • 4
  • 24
  • 26