2

I am currently working on a program that scrapes Yahoo Finance Earnings Calendar Page and stores the data in a file. I am able to scrape the data but I am confused as to why it only scrapes the first 2 and last 2 columns. I also tried to do the same with a table on Wikipedia for List of S&P 500 Companies and am running into the same problem. Any help is appreciated.

Yahoo Finance Code

import csv
import pandas as pd
earnings = pd.read_html('https://finance.yahoo.com/calendar/earnings?day=2019-11-19')[0]
fileName = "testFile"
with open(fileName + ".csv", mode='w') as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow([earnings])

print(earnings)

Wikipedia Code

import pandas as pd
url = r'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
tables = pd.read_html(url) # Returns list of all tables on page
sp500_table = tables[0] # Select table of interest
print(sp500_table)

~EDIT~

Here is the output I get from the Yahoo Finance Code

"   Symbol                                Company  ... Reported EPS  Surprise(%)
0    WUBA                             58.com Inc  ...          NaN          NaN
1    ARMK                                Aramark  ...          NaN          NaN
2    AFMD                             Affimed NV  ...          NaN          NaN
3     NJR              New Jersey Resources Corp  ...          NaN          NaN
4    ECCB         Eagle Point Credit Company Inc  ...          NaN          NaN
5    TOUR                             Tuniu Corp  ...          NaN          NaN
6     EIC         Eagle Point Income Company Inc  ...          NaN          NaN
7     KSS                             Kohls Corp  ...          NaN          NaN
8     JKS              JinkoSolar Holding Co Ltd  ...          NaN          NaN
9      DL  China Distance Education Holdings Ltd  ...          NaN          NaN
10    TJX                      TJX Companies Inc  ...          NaN          NaN
11     HD                         Home Depot Inc  ...          NaN          NaN
12   PAGS                  PagSeguro Digital Ltd  ...          NaN          NaN
13    ESE                  ESCO Technologies Inc  ...          NaN          NaN
14   RADA         Rada Electronic Industries Ltd  ...          NaN          NaN
15   RADA         Rada Electronic Industries Ltd  ...          NaN          NaN
16   DAVA                             Endava PLC  ...          NaN          NaN
17   FALC                FalconStor Software Inc  ...          NaN          NaN
18    GVP                        GSE Systems Inc  ...          NaN          NaN
19    TDG                    TransDigm Group Inc  ...          NaN          NaN
20   PPDF                        PPDAI Group Inc  ...          NaN          NaN
21   GRBX                           Greenbox Pos  ...          NaN          NaN
22   THMO             Thermogenesis Holdings Inc  ...          NaN          NaN
23    MMS                            Maximus Inc  ...          NaN          NaN
24   NXTD                             NXT-ID Inc  ...          NaN          NaN
25   URBN                   Urban Outfitters Inc  ...          NaN          NaN
26   SINT                 SINTX Technologies Inc  ...          NaN          NaN
27   ORNC                             Oranco Inc  ...          NaN          NaN
28   LAIX                               LAIX Inc  ...          NaN          NaN
29    MDT                          Medtronic PLC  ...          NaN          NaN

[30 rows x 6 columns]"


Here is the output I get from Wikipedia Code

    Symbol                         Security  ...      CIK      Founded
0      MMM                       3M Company  ...    66740         1902
1      ABT              Abbott Laboratories  ...     1800         1888
2     ABBV                      AbbVie Inc.  ...  1551152  2013 (1888)
3     ABMD                      ABIOMED Inc  ...   815094         1981
4      ACN                    Accenture plc  ...  1467373         1989
5     ATVI              Activision Blizzard  ...   718877         2008
6     ADBE                Adobe Systems Inc  ...   796343         1982
7      AMD       Advanced Micro Devices Inc  ...     2488         1969
8      AAP               Advance Auto Parts  ...  1158449         1932
9      AES                         AES Corp  ...   874761         1981
10     AMG    Affiliated Managers Group Inc  ...  1004434         1993
11     AFL                        AFLAC Inc  ...     4977         1955
12       A         Agilent Technologies Inc  ...  1090872         1999
13     APD     Air Products & Chemicals Inc  ...     2969         1940
14    AKAM          Akamai Technologies Inc  ...  1086222         1998
15     ALK             Alaska Air Group Inc  ...   766421         1985
16     ALB                   Albemarle Corp  ...   915913         1994
17     ARE  Alexandria Real Estate Equities  ...  1035443         1994
18    ALXN          Alexion Pharmaceuticals  ...   899866         1992
19    ALGN                 Align Technology  ...  1097149         1997
20    ALLE                         Allegion  ...  1579241         1908
21     AGN                    Allergan, Plc  ...  1578845         1983
22     ADS            Alliance Data Systems  ...  1101215         1996
23     LNT              Alliant Energy Corp  ...   352541         1917
24     ALL                    Allstate Corp  ...   899051         1931
25   GOOGL             Alphabet Inc Class A  ...  1652044         1998
26    GOOG             Alphabet Inc Class C  ...  1652044         1998
27      MO                 Altria Group Inc  ...   764180         1985
28    AMZN                  Amazon.com Inc.  ...  1018724         1994
29    AMCR                        Amcor plc  ...  1748790          NaN
..     ...                              ...  ...      ...          ...
475   VIAB                      Viacom Inc.  ...  1339947          NaN
476      V                        Visa Inc.  ...  1403161          NaN
477    VNO             Vornado Realty Trust  ...   899689          NaN
478    VMC                 Vulcan Materials  ...  1396009          NaN
479    WAB               Wabtec Corporation  ...   943452          NaN
480    WMT                          Walmart  ...   104169          NaN
481    WBA         Walgreens Boots Alliance  ...  1618921          NaN
482    DIS          The Walt Disney Company  ...  1001039          NaN
483     WM            Waste Management Inc.  ...   823768         1968
484    WAT               Waters Corporation  ...  1000697         1958
485    WEC             Wec Energy Group Inc  ...   783325          NaN
486    WCG                         WellCare  ...  1279363          NaN
487    WFC                      Wells Fargo  ...    72971          NaN
488   WELL                   Welltower Inc.  ...   766704          NaN
489    WDC                  Western Digital  ...   106040          NaN
490     WU                 Western Union Co  ...  1365135         1851
491    WRK                         WestRock  ...  1636023          NaN
492     WY                     Weyerhaeuser  ...   106535          NaN
493    WHR                  Whirlpool Corp.  ...   106640         1911
494    WMB                    Williams Cos.  ...   107263          NaN
495   WLTW             Willis Towers Watson  ...  1140536          NaN
496   WYNN                 Wynn Resorts Ltd  ...  1174922          NaN
497    XEL                  Xcel Energy Inc  ...    72903         1909
498    XRX                            Xerox  ...   108772         1906
499   XLNX                           Xilinx  ...   743988          NaN
500    XYL                       Xylem Inc.  ...  1524472          NaN
501    YUM                  Yum! Brands Inc  ...  1041061          NaN
502    ZBH           Zimmer Biomet Holdings  ...  1136869          NaN
503   ZION                    Zions Bancorp  ...   109380          NaN
504    ZTS                           Zoetis  ...  1555280          NaN

[505 rows x 9 columns]

As you can see in both examples the table conveniently omits the coloums in the middle and only displays the first and last 2.

~EDIT#2~

Making this change to the code now displays all coloumns but it does so in two seperate tables instead. Any idea as to why it does this?

fileName = "yahooFinance_Pandas"
with pd.option_context('display.max_columns', None):  # more options can be specified also
    with open(fileName + ".csv", mode='w') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow([earnings])

OUTPUT

"   Symbol                                Company  Earnings Call Time  \
0    WUBA                             58.com Inc  Before Market Open   
1    ARMK                                Aramark  Before Market Open   
2    AFMD                             Affimed NV                 TAS   
3     NJR              New Jersey Resources Corp  Before Market Open   
4    ECCB         Eagle Point Credit Company Inc  Before Market Open   
5    TOUR                             Tuniu Corp  Before Market Open   
6     EIC         Eagle Point Income Company Inc  Before Market Open   
7     KSS                             Kohls Corp  Before Market Open   
8     JKS              JinkoSolar Holding Co Ltd  Before Market Open   
9      DL  China Distance Education Holdings Ltd  After Market Close   
10    TJX                      TJX Companies Inc  Before Market Open   
11     HD                         Home Depot Inc  Before Market Open   
12   PAGS                  PagSeguro Digital Ltd                 TAS   
13    ESE                  ESCO Technologies Inc  After Market Close   
14   RADA         Rada Electronic Industries Ltd                 TAS   
15   RADA         Rada Electronic Industries Ltd  Before Market Open   
16   DAVA                             Endava PLC                 TAS   
17   FALC                FalconStor Software Inc  After Market Close   
18    GVP                        GSE Systems Inc                 TAS   
19    TDG                    TransDigm Group Inc  Before Market Open   
20   PPDF                        PPDAI Group Inc  Before Market Open   
21   GRBX                           Greenbox Pos   Time Not Supplied   
22   THMO             Thermogenesis Holdings Inc  After Market Close   
23    MMS                            Maximus Inc                 TAS   
24   NXTD                             NXT-ID Inc                 TAS   
25   URBN                   Urban Outfitters Inc  After Market Close   
26   SINT                 SINTX Technologies Inc   Time Not Supplied   
27   ORNC                             Oranco Inc   Time Not Supplied   
28   LAIX                               LAIX Inc  After Market Close   
29    MDT                          Medtronic PLC                 TAS   

    EPS Estimate  Reported EPS  Surprise(%)  
0           0.82           NaN          NaN  
1           0.69           NaN          NaN  
2          -0.17           NaN          NaN  
3           0.28           NaN          NaN  
4            NaN           NaN          NaN  
5            NaN           NaN          NaN  
6            NaN           NaN          NaN  
7           0.86           NaN          NaN  
8           0.83           NaN          NaN  
9           0.33           NaN          NaN  
10          0.66           NaN          NaN  
11          2.52           NaN          NaN  
12          0.29           NaN          NaN  
13          1.06           NaN          NaN  
14         -0.02           NaN          NaN  
15         -0.02           NaN          NaN  
16         21.21           NaN          NaN  
17           NaN           NaN          NaN  
18          0.03           NaN          NaN  
19          5.16           NaN          NaN  
20          0.26           NaN          NaN  
21           NaN           NaN          NaN  
22         -0.12           NaN          NaN  
23          0.94           NaN          NaN  
24           NaN           NaN          NaN  
25          0.57           NaN          NaN  
26           NaN           NaN          NaN  
27           NaN           NaN          NaN  
28         -0.32           NaN          NaN  
29          1.28           NaN          NaN  "

~EDIT#3~

Made this change as you requested @Alex

earnings.to_csv(r'C:\Users\akkir\Desktop\pythonSelenium\export_dataframe.csv', index = None)

OUTPUT

Symbol,Company,Earnings Call Time,EPS Estimate,Reported EPS,Surprise(%)
ATTO,Atento SA,TAS,0.09,0.03,-66.67
ALPN,Alpine Immune Sciences Inc,TAS,-0.68,-0.62,8.82
ALPN,Alpine Immune Sciences Inc,Time Not Supplied,-0.68,-0.62,8.82
HOLI,Hollysys Automation Technologies Ltd,TAS,0.48,0.49,2.08
IDSA,Industrial Services of America Inc,After Market Close,,,
AGRO,Adecoagro SA,TAS,-0.01,,
ATOS,Atossa Genetics Inc,TAS,-0.52,-0.36,30.77
AXAS,Abraxas Petroleum Corp,TAS,0.03,0.02,-33.33
ACIU,AC Immune SA,TAS,0.17,0.25,47.06
ARCO,Arcos Dorados Holdings Inc,TAS,0.08,0.13,62.5
WTER,Alkaline Water Company Inc,Time Not Supplied,-0.07,-0.07,
ALNA,Allena Pharmaceuticals Inc,Before Market Open,-0.49,-0.57,-16.33
AEYE,AudioEye Inc,TAS,-0.26,-0.27,-3.85
APLT,Applied Therapeutics Inc,Before Market Open,-0.49,-0.63,-28.57
ALT,Altimmune Inc,TAS,-0.19,-0.73,-284.21
ABEOW,Abeona Therapeutics Inc,TAS,,,
ACER,Acer Therapeutics Inc,After Market Close,-0.57,-0.52,8.77
SRNN,Southern Banc Company Inc,Time Not Supplied,,,
SPB,Spectrum Brands Holdings Inc,Before Market Open,1.11,1.13,1.8
BIOC,Biocept Inc,TAS,-0.27,-0.25,7.41
IDXG,Interpace Biosciences Inc,TAS,-0.19,-0.19,
GTBP,GT Biopharma Inc,After Market Close,,,
MTNB,Matinas BioPharma Holdings Inc,Time Not Supplied,-0.03,-0.03,
MTNB,Matinas BioPharma Holdings Inc,TAS,-0.03,-0.03,
XELB,Xcel Brands Inc,After Market Close,0.12,0.06,-50.0
BBI,Brickell Biotech Inc,After Market Close,,,
SNBP,Sun Biopharma Inc,Before Market Open,,,
BZH,Beazer Homes USA Inc,TAS,0.51,0.08,-84.31
SELB,Selecta Biosciences Inc,TAS,-0.33,-0.26,21.21
BEST,BEST Inc,Before Market Open,,0.01,
CBPO,China Biologic Products Holdings Inc,TAS,0.88,1.4,59.09
TPCS,TechPrecision Corp,TAS,,,
LK,Luckin Coffee Inc,Before Market Open,-0.37,-0.32,13.51
CYD,China Yuchai International Ltd,Before Market Open,0.45,0.17,-62.22
CCF,Chase Corp,After Market Close,,,
SMCI,Super Micro Computer Inc,After Market Close,,,
AUMN,Golden Minerals Co,TAS,,,
PGR,Progressive Corp,Before Market Open,1.3,1.33,2.31
PUMP,ProPetro Holding Corp,TAS,0.51,0.33,-35.29
CPLG,CorePoint Lodging Inc,TAS,-0.44,-0.22,50.0
CHNG,Change Healthcare Inc,After Market Close,0.27,0.27,
NOVC,Novation Companies Inc,Time Not Supplied,,,
WFCF,Where Food Comes From Inc,Before Market Open,,,
CYCCP,Cyclacel Pharmaceuticals Inc,After Market Close,,,
ISCO,International Stem Cell Corp,Before Market Open,,,
CPA,Copa Holdings SA,TAS,2.23,2.45,9.87
CSCO,Cisco Systems Inc,TAS,0.81,0.84,3.7
GMDA,Gamida Cell Ltd,TAS,-0.36,-0.3,16.67
CHRA,Charah Solutions Inc,TAS,-0.05,-0.11,-120.0
MNI,McClatchy Co,TAS,-1.01,-0.16,84.16
ENSV,Enservco Corp,TAS,-0.06,-0.1,-66.67
TK,Teekay Corp,TAS,,,
SANW,S&W Seed Co,TAS,-0.15,-0.15,
SANW,S&W Seed Co,Before Market Open,-0.15,-0.15,
CMCM,Cheetah Mobile Inc,TAS,0.14,0.49,250.0
CYRN,Cyren Ltd,TAS,-0.07,-0.06,14.29
CATS,Catasys Inc,TAS,-0.32,-0.52,-62.5
GLAD,Gladstone Capital Corp,TAS,0.21,0.21,
PING,Ping Identity Holding Corp,After Market Close,0.01,0.13,1200.0
CRWS,Crown Crafts Inc,Before Market Open,0.18,0.18,
CTRP,Ctrip.Com International Ltd,After Market Close,0.29,,
GFF,Griffon Corp,After Market Close,0.33,0.4,21.21
CLIR,Clearsign Technologies Corp,After Market Close,,,
DMAC,DiaMedica Therapeutics Inc,After Market Close,,,
DSSI,Diamond S Shipping Inc,Time Not Supplied,-0.12,-0.19,-58.33
DSSI,Diamond S Shipping Inc,TAS,-0.12,-0.19,-58.33
DYAI,Dyadic International Inc,After Market Close,,,
ONE,OneSmart International Education Group Ltd,Before Market Open,,,
EFOI,Energy Focus Inc,Before Market Open,-0.15,-0.08,46.67
EDAP,Edap Tms SA,TAS,0.04,0.03,-25.0
EYEN,Eyenovia Inc,Before Market Open,-0.34,-0.29,14.71
EQS,EQUUS Total Return Inc,After Market Close,,,
SENR,Strategic Environmental & Energy Resources Inc,Before Market Open,,,
EPSN,Epsilon Energy Ltd,TAS,,,
GRMM,Grom Social Enterprises Inc,Before Market Open,,,
ECOR,"electroCore, Inc.",TAS,-0.31,-0.36,-16.13
SD,SandRidge Energy Inc,TAS,,,
ENR,Energizer Holdings Inc,TAS,0.81,0.93,14.81
ELMD,Electromed Inc,TAS,0.01,0.12,1100.0
EVK,Ever-Glory International Group Inc,TAS,,,
FTEK,Fuel Tech Inc,After Market Close,-0.03,-0.05,-66.67
FVRR,Fiverr International Ltd,Before Market Open,-0.19,-0.12,36.84
SGRP,SPAR Group Inc,TAS,,,
NSEC,National Security Group Inc,Time Not Supplied,,,
SNDL,Sundial Growers Inc,TAS,-0.08,,
SNDL,Sundial Growers Inc,Before Market Open,-0.08,,
TCOM,Trip.com Group Ltd,TAS,,,
RAVE,Rave Restaurant Group Inc,TAS,,,
SLGG,Super League Gaming Inc,After Market Close,-0.36,-0.43,-19.44
HI,Hillenbrand Inc,After Market Close,0.73,0.76,4.11
HROW,Harrow Health Inc,TAS,-0.24,-0.29,-20.83
NVGS,Navigator Holdings Ltd,TAS,-0.07,-0.01,85.71
INFU,InfuSystem Holdings Inc,Before Market Open,,,
OSW,OneSpaWorld Holdings Ltd,Before Market Open,0.12,0.11,-8.33
VIPS,Vipshop Holdings Ltd,TAS,0.17,0.25,47.06
PRTH,Priority Technology Holdings Inc,After Market Close,-0.12,-0.08,33.33
TGC,Tengasco Inc,TAS,,,
PRSP,Perspecta Inc,After Market Close,0.51,0.54,5.88
REED,Reed's Inc,After Market Close,-0.11,-0.14,-27.27
WSTL,Westell Technologies Inc,After Market Close,,,
CSM
  • 23
  • 5
  • 1
    earnings seems to contain everything thats on the page? what else are you expecting – Derek Eden Nov 17 '19 at 06:18
  • Can you share the output you’re getting? – AMC Nov 17 '19 at 06:52
  • @AlexanderCécile I have added the output directly to the qestion. – CSM Nov 17 '19 at 07:07
  • Only the first and last columns are printed so as to keep the output from being massive and difficult to read. You can even see at the end of your output that your DataFrame has 9 columns. Take a look [here](https://stackoverflow.com/a/30691921/11301900) if you want to print the entire thing. You could also use [`.info`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.info.html) to get some general information on your columns. – AMC Nov 17 '19 at 07:20
  • That `pd.option_context()` is meant to be used when _printing_ the DataFrame. If just want to write it to a file as CSV, take a look at the [`.to_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html) method. – AMC Nov 17 '19 at 08:14
  • Also I just noticed that the way you are writing it to file is incorrect. What is inside the `earnings` list, how many elements does it contain? – AMC Nov 17 '19 at 08:17
  • @AlexanderCécile I don't like how the .to_csv() writes to the file, earnings contains all the data it gathers from the table on yahoo finance earnings. – CSM Nov 17 '19 at 08:23
  • @CSM Can you elaborate, what don’t you like? `earnings` is a list, I wanted to know how many elements it contains. – AMC Nov 17 '19 at 08:28
  • @AlexanderCécile If you check above where it says "Edit#3" you can see how the output is not formatted as it was in the earlier 2 outputs. I liked the way it printed the table in a neat and clean format when not using .to_csv(). Also earnings is a list of data from the table that it scrapes so it doesn't have a defined list of elements. It depends on how many companies are listed on the table for said day. – CSM Nov 17 '19 at 08:35
  • @CSM Those are two very different situations though. CSV isn’t meant to be pretty or easy to read for human beings, it’s just a format to store data. What are you expecting to do with the data once it is written to file? As for `elements` can you just tell me the size next time you run the program lol – AMC Nov 17 '19 at 08:39
  • @AlexanderCécile Hmm I get your point. I guess .to_csv will work since I can open the file in google sheets and view it there. Thanks for your help today. – CSM Nov 17 '19 at 08:43
  • @CSM Google sheets, Excel, and tons more. So should I edit my answer a bit to reflect the back-and-forth in the comments? – AMC Nov 17 '19 at 08:46
  • @AlexanderCécile I posted an answer to the question but feel free to post one and I can delete mine. – CSM Nov 17 '19 at 08:48
  • @CSM I’ll come back to it tomorrow when I can write properly, I’m tired lol – AMC Nov 17 '19 at 08:51
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/202522/discussion-between-alexander-cecile-and-csm). – AMC Nov 17 '19 at 19:04
  • @CSM To address edit #2: It's all the same DataFrame, the `'\'` indicates that it continues on the next line. – AMC Nov 20 '19 at 00:55

2 Answers2

1

As far as I can tell this nothing to do with the data and everything to do with the representation. Only the first and last columns are printed so as to keep the output from being massive and difficult to read. You can even see at the end of your output that your DataFrame has 9 columns.

Take a look here if you want to print the entire thing. You could also use .info to get some general information on your columns.

AMC
  • 2,642
  • 7
  • 13
  • 35
  • I am not too sure where I would use the fix suggested. I am new to coding and python. Thanks for your help so far. – CSM Nov 17 '19 at 08:03
  • I figured out how to use it but now I am having a different issue. I will post in the original section so you can see what I mean. – CSM Nov 17 '19 at 08:09
  • @CSM My point is that there isn’t really much of a problem in the first place! You were worried that you were missing some of the data, right? – AMC Nov 17 '19 at 08:10
  • I posted and edit to the main question if you can take a look at it for me. I was worried that I was missing data and that was not the case. But now I am getting the data in two separate tables. I want it to display all the information/data gathered in row by row instead splitting half the columns two differently tables. – CSM Nov 17 '19 at 08:15
0

Thanks to @AlexanderCécile for the help regarding this issue.

For those interested in how he fixed my issue the code is below.

import pandas as pd
from datetime import date

pd.option_context('display.max_rows', None, 'display.max_columns', None)
earnings = pd.read_html('https://finance.yahoo.com/calendar/earnings?day=2019-11-13')[0]

earnings.to_csv(r'C:\Users\<user>\Desktop\earnings_{}.csv'.format(date.today()), index=None)
CSM
  • 23
  • 5
  • I'm guessing you don't want to include the index in the CSV since it's just a simple integer range? Also, I don't think the use of `option_context` has any effect here, although I may be wrong. – AMC Nov 17 '19 at 19:07