Reshaping for Pearsonr correlation

Question

What is the best way to delete and match length of datasets when doing a pearsonr correlation?

I am currently running a pearsonr correlation on returns and various fundamental indicator only issue is when I have nans and when I run it I get nan when I dropna() I have different size datasets and get an error regarding the shapes. operands could not be broadcast together with shapes (469099,) (539093,)

I know I cant replace them with other values but is there anyway to minimize the length of the columns so they can equal each other — J.J., Dec 12 '18 at 20:47
You shouldn't drop `NaN` values from each array separately, otherwise you ruin pairing of observations and your correlation is meaningless. — ALollz, Dec 12 '18 at 20:52
If your two datasets are stored in different columns of the same Pandas DataFrame, `dropna()` will (by default) drop the rows in which any value is nan, which is apparently what you want. So it sounds like your datasets are not stored in a single DataFrame. In that case, you'll have to make sure that when you drop a value from one dataset, you also drop the corresponding value from the other dataset. Or, put them into two columns of one DataFrame, and then use `dropna()`. — Warren Weckesser, Dec 12 '18 at 21:05

score 1 · Answer 1 · answered Dec 12 '18 at 20:49

It is not clear on the question what you are trying to do; however, I assume you are trying to drop 'Na' from the data so the both sets match in shape. If you are running dropna(), make sure to set 'inplace = True' as a parameter or to assign it to a dataframe.

Either

df.dropna(inplace = True)

or

df = df.dropna()

You can also check: Can't drop NAN with dropna in pandas

Reshaping for Pearsonr correlation

1 Answers1