My code works well, I have no problem extracting what I need. my problem is in some difference coming from using the response of the web service to a different result of doing the same but with the value of the web service saved in a variable. I have this blocker for days, and I hope you please help.
NOTE: the suggested duplicate questions answers don't work for me, this isn't a duplicate question.
I'm consuming a web service. the answer I get is stored in the variable answerService
, this is a very long string and after this I extract what is inside the tag span
that has this structure:
<span style = "font-weight: bold"> xxx </ span>
"xxx" is what I want to extract
#with that I get the "xxx"
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
I get an array of "n" length according to the span
existing with this structure.
If I do this directly from the web service it does not work and I only get this answer:
['áGILMENTE']
Now, if I put the response of the web service sameStringOfAnswer
in my code, the result is different:
print(arraySpan)
['ADV', 'áGILMENTE']
By logic the answer is the same and never changes, for some strange reason in real time when I get the response from the web service, I only get ['áGILMENTE']
when the answer I expect is ['ADV', 'áGILMENTE'
]
This is the key piece that shows that 2 span
is always coming with the structure I need:
Here is my code:
import requests
import re
session = requests.Session()
getId=session.get('http://cartago.lllf.uam.es/grampal/grampal.cgi')
cookie=session.cookies.get_dict()
getId=session.cookies.get_dict()
getId=getId["CGISESSID"]
#getting an ID for request a webservice
getService=requests.get("http://cartago.lllf.uam.es/grampal/grampal.cgi?m=analiza&csrf="+getId+"&e="+"ágilmente", cookies=cookie)
answerService=getService.text
#get the value of the <span>
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', answerService)
print(answerService)
print("array",arraySpan)
#same code but using the result of service web
sameStringOfAnswer='<html xmlns="http://www.w3.org/TR/REC-html40"><head><title>Grampal </title><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><meta name="Content-Language" content="EN"><meta name="author" content="jmguirao@ugr.es"><link rel="icon" type="image/ico" href="/favicon.ico"/><style type="text/css">html,body,form,ul,li,h1,h3,p{margin:0; padding:0}body{font-family: Arial, Helvetica, sans-serif; background-color:#fff}a{text-decoration: none;}a:hover{text-decoration: underline}ul{list-style-type: none}td{padding: 0.5pc 2pc 0pc 0pc}.nav{float: right; padding: 0.5pc 0.5pc 0.5pc 0.5pc; margin-left:5px}.nav li{display:inline; border-left: 1px solid #444; padding:0 0.4em;}.nav li.first{border-left:0}.hide{display:none}input{text-indent: 2px}input[type="submit"]{text-indent: 0}DIV.delPage{padding: 0.5ex 5em 0.5em 5em; background-color:#ffd6ba;}.delMain{padding: 2ex 0.5em 0.5pc 0.5em;}.post{margin-bottom: 0.25pc; font-size: 100%; padding-top: 0.5ex;}.posts, #posts{padding: 0.5ex 0.5em 0.5pc 50px;}.banner{padding: 0.5ex 0 0.5pc 0.5em;background-color: #ffc6aa;clear: both}.banner h1{font-weight: bolder; font-size: 150%;margin:0; padding:0 0 0 26px; display: inline;}h2{font-weight: bolder; font-size: 140%; color: red; margin:0; padding:0 0 0 26px; display: inline;}.resaltado{font-weight: bolder;font-size: 100%}</style></head><body><div class="banner"><ul class="hide"><li><a href="#content">skip to content</a></li></ul><ul class="nav">Análsis de:<li class="first"><a title="Analizador morfosintáctico" href="/grampal/grampal.cgi?m=analiza&e=ágilmente">palabras</a></li><li><a title="Desambiguador contextual" href="/grampal/grampal.cgi?m=etiqueta&e=ágilmente">oraciones</a></li><li><a title="Etiquetado de textos" href="/grampal/grampal.cgi?m=xml">textos</a></li><li><a title="Formas de una palabra" href="/grampal/grampal.cgi?m=genera&e=ágilmente">Generación de formas</a></li><!--<li><a title="Transcripción fonética" href="/grampal/grampal.cgi?m=transcribe&e=ágilmente">Transcripción</a></li>--><li><a href="/grampal/grampal.cgi?m=etiquetario">Etiquetario</a></li><li><a href="/grampal/grampal.cgi?m=autores">Autores</a></li></ul><h1>Grampal</h1></div><div class="delPage" style="font-size: 80%;"><form method="GET" action="/grampal/grampal.cgi"><input type="hidden" name="m" value="analiza"><input type="hidden" name="csrf" value="94508700a0ae409a90718299ae00b0e0"><span class="resaltado">Palabra : </span><input name="e" size="60" value="ágilmente"><input type="submit" value="Analiza"> </form></div><br><h2>ágilmente</h2><div class="delMain"><div id="posts"><table><tr><td style="font-style:italic;font-size:90%">categoría <span style="font-weight:bold"> ADV </span></td><td style="font-style:italic;font-size:90%">lema <span style="font-weight:bold"> áGILMENTE </span></td></tr></table></div></div></body></html>'
arraySpan = re.findall(r'<span style="font-weight:bold">(.*?)<', sameStringOfAnswer)
print(arraySpan)
What am I doing wrong?