0

I am new to pandas and new to python in general. I have been tasked with matching some products from our database. I was following a step-by-step process written here. But I got stuck on a instruction that says correlate the dataframe. I am getting NaN values here. The 1st two pictures are the same dataframe I want to operate corr on. Third one is the result. Can I get some guidance here, please?

datafreame_pic1: datafreame_pic1

datafreame_pic2: datafreame_pic2

result_of_df.corr(): result_of_df.corr()

Amir Charkhi
  • 768
  • 7
  • 23
Dark Coder
  • 13
  • 4
  • if the column in match is all 1s, correlation will give u NA because there is no deviation – StupidWolf Aug 02 '21 at 07:22
  • avoid pasting images of tables and provide the code please, like how you obtained the last data frame before the correlation. see https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – StupidWolf Aug 02 '21 at 07:24
  • to get the last dataframe before the correlation, I had to pass my original dataframe through a series of jellyfish functions all of which are written on the instruction on this site https://practicaldatascience.co.uk/machine-learning/how-to-create-a-product-matching-model-using-xgboost @StupidWolf – Dark Coder Aug 02 '21 at 07:44

1 Answers1

0

This has been answered before here.

Here is the definition of correlation between two variables (i and j being the elements of each variable):

cor(i,j) = cov(i,j)/[stdev(i)*stdev(j)]

If the values of the ith or jth variable do not vary (in your case they are all 1, so they are not really changing), then the respective standard deviation will be zero and so will the denominator of the fraction. Thus, the correlation will be NaN.

If you try the following, you will get all NaN values:

import numpy as np
import pandas as pd

dataframe = pd.DataFrame(np.ones(16).reshape(4,4))

enter image description here

dataframe.corr()

enter image description here

Amir Charkhi
  • 768
  • 7
  • 23