I've been trying to access the .txt files off a website using the Requests module. When I log in using the username and password manually I'm able to see the true data in my browser.
Point Code Issue Date Trade Date Region Pricing Point Low High Average Volume Deals Delivery Start Date Delivery End Date
RMTNWW 2018-10-09 2018-10-08 Rocky Mountains Northwest Wyoming Pool 2.910 2.955 2.935 323 44 2018-10-09 2018-10-09
RMTOPAL 2018-10-09 2018-10-08 Rocky Mountains Opal 2.925 3.050 2.965 209 40 2018-10-09 2018-10-09
But when I try accessing the same page with my script and print the content with
print(page.content)
The output comes out as the html source:
b'<!DOCTYPE html>\n<html>\n<head>\n\n<meta name="csrf-param" content="authenticity_token"/>\n<meta name="csrf-token" content="s35g4TAUN6+5V8Xi8x7u6f2FwziX3gbW9iY9D45HnEw="/>\n<meta http-equiv="content-type" content="text/html;charset=utf-8">
\n<meta name="description" content="Natural Gas Intelligence">\n<meta name="keywords" content="gas, natural gas, natural gas prices, enery prices, NYMEX, nymex settlement, aga, storage, natural gas data, henry hub, ferc, power, electricity, electric, megawatt, methane, reliability, inside, ngi">\n\n\n\n<meta content="false" name="has-log-view" />\n<!--<meta content="IE=EmulateIE7" http-equiv="X-UA-Compatible"/>
.
.
.
Nothing inside this HTML has any of the tags shown above (Point Code, Issue Date, etc...) so I feel this might be a log in problem. The sign on URL is https://www.naturalgasintel.com/user/login
whereas the file is located in a path https://www.naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/10/20181009td.txt
.
My script is:
import requests
with requests.Session() as c:
data_url = 'https://naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/'
username = ''
password = ''
login_data = dict(username=username, password=password)
c.post(data_url, data=login_data, headers={'Referer':'https://www.naturalgasintel.com/'})
page = c.get('https://www.naturalgasintel.com/ext/resources/Data-Feed/Daily-GPI/2018/10/20181009td.txt', stream=True)
print(page.content)
I'd like to save the actual .txt contents of the page and not the html source using the open
function where I can write
the contents into a file using something like:
localfile = 'output_{}.csv'
datafile = open(localfile, "w", encoding="utf-8")
datafile.write(page)
datafile.close()
How can I get these contents instead of the html source?