-4

Alright, so basically I would like to search the Body tags for {~ , then get whatever follows that until ~} and turn that into a string (not including the {~ or ~} ).

McMilan
  • 11
  • 2

3 Answers3

2

const match = document.body.innerHTML.match(/\{~(.+)~\}/);
if (match) console.log(match[1]);
else console.log('No match found');
<body>text {~inner~} text </body>
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
2

$(function(){

var bodyText = document.getElementsByTagName("body")[0].innerHTML;

found=bodyText.match(/{~(.*?)~}/gi);


$.each(found, function( index, value ) {
var ret = value.replace(/{~/g,'').replace(/~}/g,'');
    console.log(ret);
});

});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
   <body> {~Content 1~}

{~Content 2~}
</body>

There you go, put gi at the end of the regex.

Wils
  • 1,178
  • 8
  • 24
  • You don't need to install a heavyweight library like jQuery just for iteration - also, if you're just selecting a single element, better to use `querySelector` than to use one of the methods that returns a collection and then select the first element in the collection. – CertainPerformance Apr 03 '18 at 02:27
  • 2
    Did he tag the post jQuery? I guess – Wils Apr 03 '18 at 02:29
  • If the text between the `{~` and `~}` contains a space, such as `{~Content 1~}`, your regex will fail... `\w` only matches ASCII based characters, so any Unicode characters outside the ASCII range would cause it to fail too. – Useless Code Apr 03 '18 at 05:05
  • @UselessCode fixed. – Wils Apr 03 '18 at 05:51
1

This is a harder problem to solve than it would first appear; things like script tags and comments can throw a wrench into things if you just grab the innerHTML of the body. The following function takes a base element to search, in your case you'll want to pass in document.body, and returns an array containing any of the strings found.

function getMyTags (baseElement) {
  const rxFindTags = /{~(.*?)~}/g;

  // .childNodes contains not only elements, but any text that
  // is not inside of an element, comments as their own node, etc.
  // We will need to filter out everything that isn't a text node
  // or a non-script tag.
  let nodes = baseElement.childNodes;
  let matches = [];
  
  nodes.forEach(node => {
    let nodeType = node.nodeType
    // if this is a text node or an element, and it is not a script tag
    if (nodeType === 3 || nodeType === 1 && node.nodeName !== 'SCRIPT') {
      let html;
      if (node.nodeType === 3) { // text node
        html = node.nodeValue;
      } else { // element
        html = node.innerHTML; // or .innerText if you don't want the HTML
      }

      let match;
      // search the html for matches until it can't find any more
      while ((match = rxFindTags.exec(html)) !== null) {
        // the [1] is to get the first capture group, which contains
        // the text we want
        matches.push(match[1]);
      }
    }
  });

  return matches;

}

console.log('All the matches in the body:', getMyTags(document.body));
console.log('Just in header:', getMyTags(document.getElementById('title')));
<h1 id="title"><b>{~Foo~}</b>{~bar~}</h1>
Some text that is {~not inside of an element~}
<!-- This {~comment~} should not be captured -->
<script>
 // this {~script~} should not be captured
</script>
<p>Something {~after~} the stuff that shouldn't be captured</p>

The regular expression /{~(.*?)~}/g works like this:

  • {~ start our match at {~
  • (.*?) capture anything after it; the ? makes it "non-greedy" (also known as "lazy") so, if you have two instances of {~something~} in any of the strings we are searching it captures each individually instead of capturing from the first {~ to the last ~} in the string.
  • ~} says there has to be a ~} after our match.

The g option makes it a 'global' search, meaning it will look for all matches in the string, not just the first one.

Further reading

Tools

There are lots of different tools out there to help you develop regular expressions. Here are a couple I've used:

  • RegExr has a great tool that explains how a particular regular expression works.
  • RegExPal
Useless Code
  • 12,123
  • 5
  • 35
  • 40
  • The one downside to this approach over just grabbing the `.innerHTML` of the body is that it won't capture something that has the `{~` and `~}` spread across in different nodes or elements. – Useless Code Apr 03 '18 at 05:10