1

When performing sklearn.metrics.pairwise.cosine_similarity, the results I got came with index 0, 1, 2... and column names 0, 1, 2...

How can I turn the results to be with original column and index names?

Dataframe for calculation:

    user_id  |    age      |  education   |   income    | length_residence
    -----------------------------------------------------------------------
    NIODB6S3 |  43.769912  |  1.537634    |  58.754647  |     7.232344
    BOAWG65L |  43.769912  |  1.537634    |  58.754647  |     7.232344
    3667B8P0 |  20.000000  |  1.000000    |  40.000000  |     4.000000
    VS53SKY5 |  35.000000  |  1.537634    |  75.000000  |    14.000000

Code I ran:

    pd.DataFrame(cosine_similarity(df))

Expected:

    user_id  |  NIODB6S3  | BOAWG65L  | 3667B8P0
    user_id  |
    ----------------------------------------------
    NIODB6S3 |  1.000000  | 0.000084  | 0.996848
    BOAWG65L |  0.000084  | 1.000000  | 0.000342
    3667B8P0 |  0.996848  | 0.000342  | 1.000000

Got:

      |     0     |    1      |     2
    --------------------------------------
    0 | 1.000000  | 0.000084  | 0.996848
    1 | 0.000084  | 1.000000  | 0.000342
    2 | 0.996848  | 0.000342  | 1.000000

I'm not sure if the default numeric index conveys the correct and original order of 'user_id' in df.

Samuel Philipp
  • 10,631
  • 12
  • 36
  • 56
Goldfish
  • 21
  • 5

1 Answers1

1

Checked with Cosine Similarity between 2 Number Lists

scipy.spatial.distance.cosine(array1, array2)

I can replace index and columns in the results with original index

result.index = df.index
result.columns = df.index

They are exactly in the same order

Goldfish
  • 21
  • 5