I'm working on my first web crawler, and I'm trying to get some data of telephone numbers in Mexico, and the website that provides the data is: site, it works with xhr requests. I have this code so far:
from requests import Request, Session
import xml.etree.ElementTree as ET
import requests
import lxml.etree as etree
url = 'https://sns.ift.org.mx:8081/sns-frontend/consulta-numeracion/numeracion-geografica.xhtml'
s = Session()
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36',
'Content-Type': 'text/html; charset=UTF-8',
}
str1 = s.post(url, headers=headers) #Loading the page
xhtml=str1.text.encode('utf-8')
#Savig the first response, to get the ViewState
text_file = open("loaded.txt", "w")
text_file.write(xhtml)
text_file.close()
x = ET.fromstring(xhtml)
namespace = "{http://www.w3.org/1999/xhtml}"
path = './/*[@id="javax.faces.ViewState"]'
e = x.findall(path.format(namespace))
for i in e:
VS = i.attrib['value'] #ViewState
print VS #ViewState
At this point I get the ViewState of the page, now I send a new POST with the data and the number I want to consult plus the ViewState.
data = {
"javax.faces.partial.ajax": "true",
"javax.faces.source": "FORM_myform:BTN_publicSearch",
"javax.faces.partial.execute": "@all",
"javax.faces.partial.render": "FORM_myform:P_containerConsulta+FORM_myform:P_containerpoblaciones+FORM_myform:P_containernumeracion+FORM_myform:P_containerinfo+FORM_myform:P_containerLocal+FORM_myform:P_containerDesplegable",
"FORM_myform:BTN_publicSearch": "FORM_myform:BTN_publicSearch",
"FORM_myform": "FORM_myform",
"FORM_myform:TXT_NationalNumber": "6564384757",
"javax.faces.ViewState=": VS #ViewState
}
req = s.post(url, data=data, headers=headers)
#Saving the new response, this is supposed to bring the results
text_file = open("Output.txt", "w")
text_file.write(req.text.encode('utf-8'))
text_file.close()
The thing is that the response I get is the full code of the page without the information, and I noticed that it comes with a new ViewState, I believe that's why is not consulting the data. Also I don't want to use selenium because I don't have a graphic interface in the server, and I need to consult a lot of numbers daily.
...UPDATE... I believe that the problem relies on JSF, need to know how to handle the data and the JSF values.