1

I have a binary matrix which have the following structure

df = pd.DataFrame({"col1": [0,1,0,1,1],
                "col2": [1,0,1,0,0],"col3": [1,1,1,0,1],"col4": [1,0,0,1,0]},index=['a', 'b', 'c', 'd', 'e'])

This is the current df enter image description here

And I am applying some similarity measure(jaccard distance) to each row and I want to have this type of item-item matrix by the end(the intersections values should comes from jaccard function\not actual values used here). Final outcome should be like this.

  a  b  c  d  e
a 0  3  2  1  1
b    0  1  2  3 
c       0  1  4
d          0  2 
e             0 

I have the jaccard similarity function defined as jaccard() I only want to know how to apply it to df so that I can have this type of representation matrix by the end. Thank You!

Isura Nirmal
  • 777
  • 1
  • 9
  • 26
  • I don't understand, what is your input df and recommended output df? – jezrael Oct 03 '15 at 05:53
  • @jezrael I edited the question showing the that `df` is the input and the final item-item matrix is the expected output after applying the `jaccard ` function. – Isura Nirmal Oct 03 '15 at 06:28
  • So basically you want to create a distance matrix? scipy's [`pdist`](http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html) does exactly that. – cel Oct 03 '15 at 06:36
  • Actually the distance functions are implemented. I want to have them in the representation I have mentioned above. So that the matrix contains the distance between two items in each cell. – Isura Nirmal Oct 03 '15 at 06:46
  • So you need apply `jaccard` function to `df`? What is input and output of `jaccard()`? Can you use function df.apply or df.applymap? [src](http://stackoverflow.com/questions/19798153/difference-between-map-applymap-and-apply-methods-in-pandas) Or you need create `jaccard_custom()`? – jezrael Oct 03 '15 at 06:48
  • @jezrael Hi this is not a problem with jaccard function. I have the function created which accepts two lists. suppose raw `a` and `b` a=[0,1,1,1] and b=[1,0,1,0] and in finally I want to store the distance value calculated by the function to be stored in a wat that all the distances between a-b a-c a-d a-e is available in a matrix. you can see that in the matrix after my sentence "Final outcome should be like this." . This is what I want finally. I have `jaccard_custom(list1,list2)` with me – Isura Nirmal Oct 03 '15 at 07:58

0 Answers0