I have been searching this forum for close match of my problem but could not locate suitable solution, so posting the query.
Am using urllib and re modules to extract certain sections of webpage. What is of interest is also the status associated with those sections.
For example, looking at the source of the webpage :
MY-TEXT #1410 finished subtask PREPARE-WORKSPACE #340418: https://cloud6.foo.bar.com/b/job/PREPARE-WORKSPACE/340418
'>SUCCESS
Am using re.compile and re.findall to extract text coming after this pattern "https://cloud6.foo" ; this matches all the text and using this list I have confirmed it is so ; but am loosing out on the status of this particular task because it is in the line immediate after the "https://" line.
How to extract one line after the matched string in the current scenario ?
Code snippet is here :
from urllib import urlopen
import re
webpage = urlopen(urllink).read()
buildPhases = re.compile(r'\<a href=\W{1}https\W{3}(.*)')
phaseLists = re.findall(buildPhases, webpage)
for item in phaseLists:
print item