I am downloading a web page and I am trying to extract some values from it.
The places of the page that I am interested in are of this type:
<a data-track=\"something\" href=\"someurl\" title=\"Heaven\"><img src=\"somesource.jpg\" /></a>
and I need to extract the href (someurl) value. Note that there are multiple entries like the one above in the HTML string that I have and thus I will use a list to store all the URLs that I extract from the string.
This is what I've tried so far:
QString html_str=myfile();
QRegExp regex("<a data-track\\=\"something\" href\\=\".*(?=\" title)");
if(regex.indexIn(html_str) != -1){
QStringList list;
QString str;
list = regex.capturedTexts();
foreach(str,list)
qDebug() << str.remove("<a data-track=\"something\" href=\"");
}
With the above code I get only one occurrence (list.count() == 1
) which contains the whole HTML string from the first occurrence of someurl
till the end of the file, without the <a data-track="something" href=""
in it, which have all been removed.