0
<?xml version='1.0' encoding='UTF-8'?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url><loc>https://google.com/2020/08/this1.html</loc><lastmod>2020-08-06T11:30:55Z</lastmod></url>
<url><loc>https://google.com/2020/08/this2.html</loc><lastmod>2020-08-05T11:30:06Z</lastmod></url>
<url><loc>https://google.com/2020/08/this3.html</loc><lastmod>2020-08-06T11:29:25Z</lastmod></url>
</lastmod></url></urlset>

I'm trying to get links from above xml to get links which has lastmod of 2020-08-06 my regex code is https:.+2020-08-05.+<\/url

but it ended up getting it all from 1st and last link

I want to match only

<url><loc>https://google.com/2020/08/this1.html</loc><lastmod>2020-08-06T11:30:55Z</lastmod></url>
<url><loc>https://google.com/2020/08/this3.html</loc><lastmod>2020-08-06T11:29:25Z</lastmod></url>

2 Answers2

0

A very easy and stupid regex - see regexr:

.*<lastmod>2020-08-06.*
maio290
  • 6,440
  • 1
  • 21
  • 38
0
/<loc>(.+)<\/loc>.*2020-08-06/g

capturing the group between loc tags Demo and explanation here: https://regex101.com/r/HBvG3K/8

rootkonda
  • 1,700
  • 1
  • 6
  • 11