I am trying to use regex to get the url from the text file. And I am taking XML in the form of .txt format My text file is locations.txt. This is the text file
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://www.apple.com/jp/shop/sitemap-index.xml</loc> </sitemap>
<sitemap>
<loc>https://www.apple.com/ph/shop/sitemap-index.xml</loc>
</sitemap>
<sitemap>
<loc>https://www.apple.com/hk-zh/shop/sitemap-index.xml</loc>
</sitemap> <sitemap> <loc>https://www.apple.com/kr/shop/sitemap- index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/nz/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/th/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/sg/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/au/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/my/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/tw/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/cn/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/hk/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/uk/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/be-nl/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/it/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/lu/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/hu/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/at/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/cz/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/fi/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/tr/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/de/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/es/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/ie/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/pl/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/se/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/ae/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/be-fr/shop/sitemap-index.xml</loc> </sitemap> <sitemap> <loc>https://www.apple.com/dk/shop/sitemap-index.xml</loc> </sitemap> <sitemap>
The script I am using :
import re
re.findall('<(loc)>(https?://)([^\s]+)(</\1>)', open('locations.txt', 'r').read())
But there is no output.