5

This is the URL for the web site: https://dataviewer.pjm.com/dataviewer/pages/public/tieflows.jsf The data I want to scrape is in the pop up window after clicking “Tie Values”.

This is my Python script:

import requests
from bs4 import BeautifulSoup

url = 'https://dataviewer.pjm.com/dataviewer/pages/public/tieflows.jsf'

data = {"javax.faces.partial.ajax": "true",
        "javax.faces.source": "chart1:j_idt293",
        "javax.faces.partial.execute": "@all",
        "javax.faces.partial.render": "chart1:valuesPanel",
        "chart1:j_idt293": "chart1:j_idt293",
        "chart1":"chart1",
        "chart1:selectedTie": "",
        "chart1:tieFlowState": "ACTUAL",
        "chart1:chart1tieflowDataTable_scrollState": "0,0"}

head = {}

with requests.Session() as s:
    init = s.get(url)
    soup = BeautifulSoup(init.text, "lxml")
    val = soup.find('input', {'id': 'javax.faces.ViewState'}).get('value')
    print(val)
    data["javax.faces.ViewState"] = val`enter code here`
    r = s.post(url, data=data, headers=head)
    print(r.text.strip())

I got the response:

<?xml version="1.0" encoding="UTF-8"?>
<partial-response>
   <changes>
      <update id="formLeftPanel:dialogMessages"><![CDATA[<div id="formLeftPanel:dialogMessages" class="ui-messages ui-widget" aria-live="polite"></div>]]></update>
      <update id="globalMessages"><![CDATA[<div id="globalMessages" class="ui-messages ui-widget" aria-live="polite"></div>]]></update>
      <update id="javax.faces.ViewState"><![CDATA[-459631281975233795:-1867808435873449001]]></update>
   </changes>
</partial-response>

Which missed the one for the data part.

When I use the javascript code below in the browser console window, I can get the data part. Does anybody know why? Thx

My javascript code is:

const viewState = document.querySelector('#chart1 > input[name="javax.faces.ViewState"]').value

const formData = new FormData()
formData.set("javax.faces.partial.ajax", true)
formData.set("javax.faces.source", "chart1:j_idt293")
formData.set("javax.faces.partial.execute", "@all")
formData.set("javax.faces.partial.render", "chart1:valuesPanel")
formData.set("chart1:j_idt293", "chart1:j_idt293")
formData.set("chart1", "chart1")
formData.set("chart1:selectedTie", "")
formData.set("chart1:tieFlowState", "ACTUAL")
formData.set("chart1:chart1tieflowDataTable_scrollState", "0,0")
formData.set("javax.faces.ViewState", viewState)

const url = "https://dataviewer.pjm.com/dataviewer/pages/public/tieflows.jsf"

const response = await fetch(url, {
  method: "POST",
  headers: {
  },
  body: formData
})

console.log(await response.text())
Grant
  • 51
  • 3
  • You are not sending a session ID with your python script. – Selaron Mar 11 '19 at 07:14
  • @Selaron, thanks for you reply. Could you be more specific? how can I send session ID? I assume the session will be taken care by the with statement. – Grant Mar 11 '19 at 15:49
  • Are you sure the python session is tracking (and submitting) cookies? If so my comment is wrong, I'm sorry. – Selaron Mar 11 '19 at 15:55
  • I probably know the reason why the data is retrieved from JavaScript in Console but not in Python. But I still do not know how to solve it in Python. The "chart1" is the form ID, and it is not returned in Python request: import requests from bs4 import BeautifulSoup r = requests.get('https://dataviewer.pjm.com/dataviewer/pages/public/tieflows.jsf') soup = BeautifulSoup(r.txt, 'html.parser') soup.find_all(id='chart1') This returned empty list. 'chart1' is not found. But for Javascript in console window, the 'chart1' form already exists. Any idea how to solve this issue? – Grant Mar 11 '19 at 16:48
  • Does `soup.find('input', {'id': 'javax.faces.ViewState'}).get('value')` find your viewStateID at all? Asking because the hidden inputs' id is not `javax.faces.ViewState` but `formId:javax.faces.ViewState` or in your case `chart1:javax.faces.ViewState`. Only the `name` attribute is `javax.faces.ViewState` as you query it in java script. – Selaron Mar 12 '19 at 12:15
  • 1
    This could be helpful: [JSF ViewState, CSRF Hacker Attacks](https://www.beyondjava.net/jsf-viewstate-and-csrf-hacker-attacks) – Martin Evans Mar 12 '19 at 12:33

0 Answers0