I am trying to batch process (search and replace) a couple hundred thousand html pages with REGEX in Notepad++. All the html pages have the exact same layout and I am basically trying to copy an element (a title) to the page tag wich isn't currently empty
<html>
<head>
<title>some title</title>
<lots of junk and newlines>
</head>
<body>
<lots of stuff, tags, content><span>stuff</span><div>more stuff</div>
<div id="uniqueID">
<span>The Title that should be copied into head's title tag</span>
</div>
...other stuff...</body>
I can find:
The title tag: <title>(.*?)</title>
And the span containing the REAL title:
(\s*<div id="uniqueID">\s*)<span>(.*)</span>(\s*</div>)
But I can't seem to be able to fit them into one expression (ignoring the junk in between) to be able to search and replace it in Notepad++.
The uniqueID div is the same in every pages (spaces, newlines), there is nothing else in it that the span with it's content. The title tag is obviously present only once in every pages. I just started with regular expressions and the possibilities are endless. I know it's not perfect for parsing HTML but for this case, it should. Anyone knows how to patch theses two expressions together to ignore the in-between content?
Thank you so much!