8

I am new to python so need a little help here. I have a dataframe with a url column with a link that allows me to download a CSV for each link. My aim is to create a loop/ whatever works so that I can run one command that will allow me to download,read the csv and create a dataframe for each of the rows. Any help would be appreciated. I have attached part of the dataframe below. If the link doesn't work (it probably won't you can just replace it with a link from 'https://finance.yahoo.com/quote/GOOG/history?p=GOOG' (any other company too) and navigate to download csv and use that link.

Dataframe:

Symbol         Link
YI             https://query1.finance.yahoo.com/v7/finance/download/YI?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
PIH            https://query1.finance.yahoo.com/v7/finance/download/PIH?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
TURN           https://query1.finance.yahoo.com/v7/finance/download/TURN?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E
FLWS           https://query1.finance.yahoo.com/v7/finance/download/FLWS?period1=1383609600&period2=1541376000&interval=1d&events=history&crumb=PMHbxK/sU6E

Thanks again.

cloudly lemons
  • 89
  • 1
  • 1
  • 3
  • The website uses cookies to deliver content. First try downloading a single file programmatically (through something like [`scrapy`](https://scrapy.org/) or `selenium`, and try to understand how the website uses the cookies, and then you can try to loop through all the urls. – Edgar Ramírez Mondragón Nov 05 '18 at 18:18

4 Answers4

20

There are multiple ways to get CSV data from URLs. From your example, namely Yahoo Finance, you can copy the Historical data link and call it in Pandas

...
HISTORICAL_URL = "https://query1.finance.yahoo.com/v7/finance/download/GOOG?period1=1582781719&period2=1614404119&interval=1d&events=history&includeAdjustedClose=true"

df = pd.read_csv(HISTORICAL_URL)

A general pattern could involve tools like requests or httpx to make a GET|POST request and then get the contents to io.

import pandas as pd
import requests
import io

url = 'https://query1.finance.yahoo.com/v7/finance/download/GOOG'
params ={'period1':1538761929,
         'period2':1541443929,
         'interval':'1d',
         'events':'history',
         'crumb':'v4z6ZpmoP98',
        }

r = requests.post(url,data=params)
if r.ok:
    data = r.content.decode('utf8')
    df = pd.read_csv(io.StringIO(data))

To get the params, I just followed the liked and copied everything after ‘?’. Check that they match ;)

Results: enter image description here

Update:


If you can see the raw csv contents directly in url, just pass the url in pd.read_csv Example data directly from url:

data_url ='https://raw.githubusercontent.com/pandas-dev/pandas/master/pandas/tests/data/iris.csv'

df = pd.read_csv(data_url)
Prayson W. Daniel
  • 14,191
  • 4
  • 51
  • 57
0

I routinely use this procedure

import pandas as pd
import requests

url="<URL TO DOWNLOAD.CSV>"
s=requests.get(url).content
c=pd.read_csv(s)
Azi_bel
  • 55
  • 1
  • 4
  • 10
0

First break down the task to smaller parts, what you need to do is:

  1. Iterate over the DataFrame with the links.

    for index, row in df.iterrows():
        url= row["Link"]
    
  2. Download JSON file from Yahoo Finance using Python's requests library. This is probably the difficult part, you will need to get cookies before actually downloading the CSV file, more info here,here and here. Once you create the proper URL with the cookie, you can download it with:

    re = requests.get(URL)
    print(re.status_code) #status code 200 for successful download
    
  3. Optionally, you can save the response to your local disk.
  4. Load it with pandas.

    df = pd.read_csv(file_name) #in case of saving file to disk
    df = pd.read_csv(re.content) #directly from the response
    
HUSMEN
  • 184
  • 2
  • 13
0

If you apply the following to the dataframe it will place each of the documents in an np.array. Not in a dataframe( I'm unsure of how to get there). But this will give you access to all the files and its only a matter of putting them in a df.

links = test['Link'].unique()

import requests
a=[]
for x in links:
     url=x
     s=requests.get(url).content
     a.append(s)

a[4] or np.array(a[4]).tolist() outputs the entire file just in the incorrect format.

Use 'https://api.iextrading.com/1.0/stock/GOOG/chart/5y?format=csv' rather than Yahoo it is much more accessible.

pandasman
  • 90
  • 18