When reading book: web scraping with python, the re expression confused me,
webpage_regex = re.compile('<a[^>]+href=["\'](.*?)["\']', re.IGNORECASE)
And a link in usually looks like:
<a href="/view/Afghanistan-1">
My confusion is that:
Since
[^>]
means no>
, why it followed by a+
? This+
seems useless.The confusion is that
(.*?)
, since*
means repeat 0 or more times, why it needs?
to repeat*
again?