I need to fetch data programmatically from JSF sites.
Here's an example: https://dataminer.pjm.com/dataminerui/pages/public/lmp.jsf
To get data, enter any Start Date and End Date and click on Export CSV on top right. (It generates a fair amount of data, so pick a 1-day range.)
In the Network tab of Chrome, I see the following request headers and form data:
Request Headers
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip, deflate
Accept-Language:en-US,en;q=0.8,ko;q=0.6,zh;q=0.4
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:425
Content-Type:application/x-www-form-urlencoded
Cookie:JSESSIONID=gixQBXBESRofyqLpiH2hlYg8; dataminer=1369707692.36895.0000; __utma=109610308.1662709339.1456530705.1456530705.1456530705.1; __utmc=109610308; __utmz=109610308.1456530705.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); JSESSIONID=8sx6CTIQhpPAAO5+4xcGGGlb; WT_FPC=id=xxx.xxx.xxx.xx-3069233008.30503152:lv=1456533141859:ss=1456530705581
Host:dataminer.pjm.com
Origin:https://dataminer.pjm.com
Referer:https://dataminer.pjm.com/dataminerui/pages/public/lmp.jsf
Upgrade-Insecure-Requests:1
User-Agent:Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36
Form Data
frmCriteria:frmCriteria
frmCriteria:calStartDate_input:01/01/2016
frmCriteria:calStopDate_input:01/02/2016
frmCriteria:mnuMarket_input:REALTIME
frmCriteria:mnuMarket_focus:
frmCriteria:mnuFreq_input:Daily
frmCriteria:mnuFreq_focus:
frmCriteria:mnuPnodes_input:All
frmCriteria:mnuPnodes_focus:
javax.faces.ViewState:8578362602192686517:-1021667131748875106
frmCriteria:j_idt78:frmCriteria:j_idt78
I see all my form data in this request. It seems like I should be able to programmatically download this CSV by submitting the right request (using Python's request library).
I've tried lots of ways of regenerating this header and form data, but can't seem to produce the CSV download.
Edit: I've tried the following. I know very little about the structure of HTTP requests and responses, and cookies, so this could be comically bad. I get a 500 on the POST.
import requests
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.8,ko;q=0.6,zh;q=0.4',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Length': 425,
'Content-Type': 'application/x-www-form-urlencoded',
'Host': 'dataminer.pjm.com',
'Origin': 'https://dataminer.pjm.com',
'Referer': 'https://dataminer.pjm.com/dataminerui/pages/public/lmp.jsf',
'Upgrade-Insecure-Requests': 1,
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.86 Safari/537.36'
}
data = {
'frmCriteria': 'frmCriteria',
'frmCriteria': 'calStartDate_input:01/01/2016',
'frmCriteria': 'calStopDate_input:01/02/2016',
'frmCriteria': 'mnuMarket_input:REALTIME',
'frmCriteria': 'mnuMarket_focus:',
'frmCriteria': 'mnuFreq_input:Daily',
'frmCriteria': 'mnuFreq_focus:',
'frmCriteria': 'mnuPnodes_input:All',
'frmCriteria': 'mnuPnodes_focus:',
'javax.faces.ViewState': '8578362602192686517:-1021667131748875106',
'frmCriteria:j_idt78': 'frmCriteria:j_idt78'
}
url = 'https://dataminer.pjm.com/dataminerui/pages/public/lmp.jsf'
with requests.Session() as s:
get_response = s.get(url)
post_response = s.post(url, headers=headers, data=data)
How can I use the requests library to fetch the CSV?