0

The website is "https://www.nseindia.com/companies-listing/corporate-filings-announcements". A friend sent me the underlying link to downloads data between some dates as csv file as "https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true\27" This link works fine in a web browser First If some one can educate how he got this link or rather how I can get this link. second I am unable to read the csv file to a data frame from this link in python. May be some issues with %27 or something else. code is

csv_url='https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=15-01-2022&csv=true%27'
df = pd.read_csv(csv_url)
print(df.head())
Ravi
  • 47
  • 6
  • This question has already been asked here: https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url – Saurav Panda Jan 20 '22 at 18:23
  • My issue could be how to pass %27 which is part of URL string – Ravi Jan 20 '22 at 18:31
  • I just opened your link on the web browser, it says resource not found. Are you sure it is the correct link? Also whats the use of %27 in the link? – Saurav Panda Jan 20 '22 at 18:33
  • I looked the answer mentioned by you. my program goes into debug mode – Ravi Jan 20 '22 at 18:35
  • Do you need any authentication to open this URL? like I am trying to open that link on my browser, it says resource not found and my pandas code is stuck, maybe because the resource is not available. – Saurav Panda Jan 20 '22 at 18:36
  • sorry the link which works is https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true\27 – Ravi Jan 20 '22 at 18:47
  • pl use https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true\27 – Ravi Jan 20 '22 at 18:50
  • the right url is https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true\27 – Ravi Jan 20 '22 at 18:53
  • Saurav, found out that first https://www.nseindia.com/companies-listing/corporate-filings-announcements should be open a tab. then the url https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true\27 works – Ravi Jan 20 '22 at 19:22
  • If my answer worked for you please mark this answer as correct answer! – Saurav Panda Jan 24 '22 at 20:18

2 Answers2

0

use wget.py
DATA_URL = 'http://www.robots.ox.ac.uk/~ankush/data.tar.gz'

DATA_URL = '/home/xxx/book/data.tar.gz'

out_fname = 'abc.tar.gz'

wget.download(DATA_URL, out=out_fname)

  • Please add the code in the code section and elaborate more on the answer. Like what is wget.py, is it a library or custom script? – Saurav Panda Jan 20 '22 at 18:24
  • can you please try with the url given. Iam unsure on how to represent \%27 which is at end of URL . That seems to be the problem – Ravi Jan 20 '22 at 18:34
  • the url that downloads the csv in web browser is https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true\27 – Ravi Jan 20 '22 at 18:52
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 21 '22 at 00:54
0

Okay so for this issue, first you need to request the NSE website with headers as mentioned in this post and then once you hit the main website, you get some cookies in your session, using which you can hit your desired url. To convert that url data to pandas compatible string, I followed this answer.

Make sure to have the custom user agent in the header else it will fail.

import pandas as pd
import io
import requests

base_url = 'https://www.nseindia.com'
session = requests.Session()
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, '
                         'like Gecko) '
                         'Chrome/80.0.3987.149 Safari/537.36',
    'accept-language': 'en,gu;q=0.9,hi;q=0.8',
    'accept-encoding': 'gzip, deflate, br'}

r = session.get(url, headers=headers, timeout=5)
cookies = dict(r.cookies)
response = session.get('https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true', timeout=5, headers=headers)

content = response.content
df=pd.read_csv(io.StringIO(content.decode('utf-8')))
print(df.head())
Saurav Panda
  • 558
  • 5
  • 12