Extracting links from XML file using python

Question

I have a sitemap XML file and I want to run a script which extracts all the urls and print it. I have tried re.findall(r'(https?://\S+)', url)

I don't want to print the suffix ' /liv ' how do I implement this using regex ?

Use an XML parser. XML must not be processed with regular expressions. — Tomalak, May 16 '18 at 06:23
[**You should not use regex to process XML (or any other structured data)!**](https://stackoverflow.com/a/1732454/7553525) — zwer, May 16 '18 at 06:35

wizzwizz4 · Accepted Answer · 2018-05-16T06:51:03.460

1

Are all of the URLs wrapped in quotation marks or surrounded by spaces? If so, you could do something like:

re.findall(r'(?P<quote>.)(https?://\S+?)(?P=quote)', url)

If you're getting the string representation of everything matched, instead of just the second group, you'll have to trim it with ...[1:-1].

edited May 16 '18 at 06:51

answered May 16 '18 at 06:29

wizzwizz4

1 Answers1