0

What I am trying to do.

I am scraping a website and generating a table and exporting it to data frame below. The code generates a table below with row indices, not shown.

I am trying to find a correlation between "Open" and "Volumes" column row by row using the code df[['Open','Volumes']].corr() but this is not yielding any output but just an underscore.

Can someone tell me where is the issue in my code?

Excerpt from code

df = pd.DataFrame({'Contracts' :Contracts,'Open':Opens, 'High':Highs, 'Low':Lows,'Last':Lasts,'Pct':Pcts,'Time':Times, 'Volumes' : Volumes, 'Previous Settled' : ps})

Output

Contracts       Open    High    Low     Last     Pct    Time        Volumes
Oct 2018 (E)    2.810   2.814   2.762   2.767   -1.77%  09/14/18    0 
Nov 2018 (E)    2.797   2.802   2.748   2.751   -1.78%  09/14/18    132969  
Dec 2018 (E)    2.886   2.890   2.840   2.843   -1.65%  09/14/18    91025   
Jan 2019 (E)    2.974   2.981   2.930   2.934   -1.64%  09/14/18    39348   
Feb 2019 (E)    2.948   2.952   2.904   2.908   -1.62%  09/14/18    39377
Community
  • 1
  • 1
  • I cannot reproduce your error. Works for me. (Getting the correlation of -0.305064). – DYZ Sep 16 '18 at 21:16
  • You mean for first row? I want to find correlation for every row – Siddharth Kulkarni Sep 16 '18 at 21:17
  • I executed the code that you posted: `df[['Open','Volumes']].corr()`. – DYZ Sep 16 '18 at 21:17
  • You probably have strings and not numbers in your data. Use `df['Open'] = pd.to_numeric(df['Open'], errors='coerce')` before using `corr()` (same for volume). Actually, it would be even better to understand *why, in the first place,* you have strings and not numbers – rafaelc Sep 16 '18 at 21:18
  • 1
    @SiddharthKulkarni there is no such thing as correlation for every row. Correlation is a scalar output from two series. Take a read [here](https://en.wikipedia.org/wiki/Correlation_and_dependence) before going into code – rafaelc Sep 16 '18 at 21:20
  • @RafaelC Even if some or all columns are non-numeric, `.corr()` would never output an underscore. – DYZ Sep 16 '18 at 21:21
  • @DYZ Yes it does (if you're using Jupyter notebook) – rafaelc Sep 16 '18 at 21:22
  • 1
    Thanks it worked after changing to numeric. How can I accept this answer? Thank you so much all for giving me pointers. – Siddharth Kulkarni Sep 16 '18 at 21:23
  • 1
    @RafaelC That underscore is an empty dataframe. – DYZ Sep 16 '18 at 21:27
  • @DYZ I know, but have to communicate in beginner's language with beginners ;) – rafaelc Sep 16 '18 at 21:29
  • How can I not find a book which could have explained step by step on the your suggestion that the code needed to be converted to numeric? Can someone suggest one? – Siddharth Kulkarni Sep 16 '18 at 21:34

1 Answers1

0

Use

df['Open'] = pd.to_numeric(df['Open'], errors='coerce')
df['Volumes'] = pd.to_numeric(df['Volumes'], errors='coerce')

You have strings and not numbers in your df, that's why you get a _ as output in Jupiter.

rafaelc
  • 57,686
  • 15
  • 58
  • 82