0

What is the best way to delete and match length of datasets when doing a pearsonr correlation?

I am currently running a pearsonr correlation on returns and various fundamental indicator only issue is when I have nans and when I run it I get nan when I dropna() I have different size datasets and get an error regarding the shapes. operands could not be broadcast together with shapes (469099,) (539093,)

J.J.
  • 75
  • 8
  • I know I cant replace them with other values but is there anyway to minimize the length of the columns so they can equal each other – J.J. Dec 12 '18 at 20:47
  • 1
    You shouldn't drop `NaN` values from each array separately, otherwise you ruin pairing of observations and your correlation is meaningless. – ALollz Dec 12 '18 at 20:52
  • 1
    If your two datasets are stored in different columns of the same Pandas DataFrame, `dropna()` will (by default) drop the rows in which any value is nan, which is apparently what you want. So it sounds like your datasets are not stored in a single DataFrame. In that case, you'll have to make sure that when you drop a value from one dataset, you also drop the corresponding value from the other dataset. Or, put them into two columns of one DataFrame, and then use `dropna()`. – Warren Weckesser Dec 12 '18 at 21:05
  • Thank you I appreciate it \ – J.J. Dec 13 '18 at 13:17

1 Answers1

1

It is not clear on the question what you are trying to do; however, I assume you are trying to drop 'Na' from the data so the both sets match in shape. If you are running dropna(), make sure to set 'inplace = True' as a parameter or to assign it to a dataframe.

Either

df.dropna(inplace = True)

or

df = df.dropna()

You can also check: Can't drop NAN with dropna in pandas

e_kapti
  • 61
  • 5