1

EDIT: I am using PCRE RegEx language for now.

I have a scenario where I have VBScript string values at the top of every webpage on my site. (This site is in the middle of a redesign.) I need to use these assignments in a search and replace scenario, using RegEx, and replace another part of HTML element to give it that string value.

This below successfully extracts "Member Access" from the variable at top of page and I can use $1 to place that variable somewhere. But this is where I get stuck. I need to stick that value somewhere else, such as in the tag. What do I type in the replace field to keep everything, but only replace certain items, such as the text in between the title tag for example?

I basically need to find two things. Find the first, then use replacement on the second find:
<title>this text</title>

 RegEx filter: /PageTitle = "(.*)"/ gm
 Replacement string: <everything before Page Title string>PageTitle = "$1"<everything after PageTitle string><title>$1</title><Rest of content after title tag>

Here's example of what each page on my site looks like:

<% 
Page Title = "Member Access"
MetaDescription = "This is a paragraph describing our website that we use to place into the meta description tag in the head. This will give information about our site."
Keywords = "Awesome, Cool, Rad, Tubular"
%>

<!doctype HTML>
<html dir="ltr" lang="en">
<head>
<meta charset="UTF-8">

<!-- Meta Tags -->
<meta name="description" content="This needs to be replaced with MetaDescription variable at top of page">
<meta name="keywords" content= "these, need, to, be, gone">
<meta name="viewport" content="width=device-width, initial-scale=1.0 shrink-to-fit=no">


<!-- Twitter and Facebook Social media tags -->
<meta property="fb:app_id" content="" />
<meta property="og:title" content="This needs to be replace with Page Title variable at top of page" >
<meta property="og:description" content="This needs to be replaced with MetaDescription variable at top of page">

 <!-- Page Title -->
 <title>This needs to be replaced with Page Title variable at top of page</title>


 </head>

 <body>

 <div id="main" class="main-content">
 <section class="inner-header divider layer-overlay overlay-dark-4" data-bg-img="/images/_interior-banners/THIS NEEDS TO BE REPLACED CONDITIONALLY BASED ON SITE FOLDER" style="background-image: url('/images/_interior-banners/THIS NEEDS TO BE REPLACED CONDITIONALLY BASED ON SITE FOLDER'); ">

 <h1 id="page-title" class="font-36">This needs to be replaced by Page Title variable at top of page</h1>

 rest of webpage content......
 </div>
 </section>
 </body>
 </html>
  • I'm confused. Can't you show a concrete example of input and expected output (with replacement)? – Poul Bak May 18 '20 at 14:49
  • I guess you need to match multiple parts - and then replace just some of them with the "title" group. EG `(Page Title = "([^"]*)")(.*)()([^<])()(.*)(

    )([^<]*)(

    )` where the title match group is $2 with replacement $1$3$4$2$6$7$8$2$10 - that doesnt work exactly, though so maybe you or someone can fix it?
    – James S May 18 '20 at 14:50
  • Obligatory link about the futility of trying to [parse XML with regex](https://stackoverflow.com/a/1732454/62576) instead of using a DOM parser. – Ken White May 18 '20 at 15:15
  • @PoulBak, well knowing what input to use is my question. But one concrete example of expected output would be: `````

    Member Access

    ````` Where "Member Access" is what was found from searching for what the Page Title variable contained. The variable will be different on every page.
    – codewelldesign May 18 '20 at 15:49

1 Answers1

1

OK... you need to match multiple bits of it - and then replace most of the bits with the original, and just some with the "title" matched group

Heres the regex that works (in Notepad++ with ". matches newline" ON)

(Page Title = "([^"]*)")(.*)(<title>)([^<]*)(</title>)(.*)(<h1 id="page-title" class="font-36">)([^<]*)(</h1>)

So that gives groups:

$1 (Page Title = "([^"]*)") - The first bit  
$2 ([^"]*) - INSIDE $1 - the thing we are wanting to use as replacements elsewhere  
$3 (.*) - everything up until....   
$4 (<title>)  
$5 ([^<]*) - inside the title tag (ie we want to replace this)  
$6 (</title>) - title closing tag  
$7 (.*) - everything up until...  
$8 (<h1 id="page-title" class="font-36">) - h1 opening tag  
$9 ([^<]*) - inside the h1 tag (ie we want to replace this)  
$10 (</h1>)

Note the use of negated character groups - so the $2 match group means any number of characters that are NOT a " This is important because regular expressions are greedy (and we want to stop when we hit a " for that group, and move to the next group)

so our replacement is...

$1$3$4$2$6$7$8$2$10
James S
  • 3,558
  • 16
  • 25
  • James, thanks! I think this is really close. Would this also work if there was text before "Page Title =..."? I've tried your example in Notepad++ and was not successful. – codewelldesign May 18 '20 at 15:35
  • James, can you get it to work here? Did it actually work in your Notepad version? https://regex101.com/r/pNXIPO/1 – codewelldesign May 18 '20 at 15:45
  • @codewelldesign I never did get it to work in regex tester, but worked OK in notepad++ v7.6 (see https://pasteboard.co/J8XDHlP.png ) I know its not the latest but Id have been surprised if the regex engine was any different... - still looks identical to your screenshot so maybe... – James S May 18 '20 at 15:52
  • James, thanks! I was able to get your code working on regex101.com. I appreciate your help! This definitely helps me to move foward. https://regex101.com/r/KoBmHA/1 – codewelldesign May 18 '20 at 17:01