Alright, so basically I would like to search the Body tags for {~ , then get whatever follows that until ~} and turn that into a string (not including the {~ or ~} ).
Asked
Active
Viewed 922 times
-4
-
Where is the `{~`? Do you want to look through the HTML source, or what? – CertainPerformance Apr 03 '18 at 01:54
-
Anywhere inside they tags – McMilan Apr 03 '18 at 01:56
-
Do you only want the text or the HTML too? – Useless Code Apr 03 '18 at 02:30
3 Answers
2
const match = document.body.innerHTML.match(/\{~(.+)~\}/);
if (match) console.log(match[1]);
else console.log('No match found');
<body>text {~inner~} text </body>

CertainPerformance
- 356,069
- 52
- 309
- 320
-
with this, you can only search one match even the html got more than one match. – Wils Apr 03 '18 at 02:28
-
Edited into a snippet which runs fine, don't know why it wouldn't work for you – CertainPerformance Apr 03 '18 at 02:31
2
$(function(){
var bodyText = document.getElementsByTagName("body")[0].innerHTML;
found=bodyText.match(/{~(.*?)~}/gi);
$.each(found, function( index, value ) {
var ret = value.replace(/{~/g,'').replace(/~}/g,'');
console.log(ret);
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js"></script>
<body> {~Content 1~}
{~Content 2~}
</body>
There you go, put gi at the end of the regex.

Wils
- 1,178
- 8
- 24
-
You don't need to install a heavyweight library like jQuery just for iteration - also, if you're just selecting a single element, better to use `querySelector` than to use one of the methods that returns a collection and then select the first element in the collection. – CertainPerformance Apr 03 '18 at 02:27
-
2
-
If the text between the `{~` and `~}` contains a space, such as `{~Content 1~}`, your regex will fail... `\w` only matches ASCII based characters, so any Unicode characters outside the ASCII range would cause it to fail too. – Useless Code Apr 03 '18 at 05:05
-
1
This is a harder problem to solve than it would first appear; things like script tags and comments can throw a wrench into things if you just grab the innerHTML
of the body. The following function takes a base element to search, in your case you'll want to pass in document.body
, and returns an array containing any of the strings found.
function getMyTags (baseElement) {
const rxFindTags = /{~(.*?)~}/g;
// .childNodes contains not only elements, but any text that
// is not inside of an element, comments as their own node, etc.
// We will need to filter out everything that isn't a text node
// or a non-script tag.
let nodes = baseElement.childNodes;
let matches = [];
nodes.forEach(node => {
let nodeType = node.nodeType
// if this is a text node or an element, and it is not a script tag
if (nodeType === 3 || nodeType === 1 && node.nodeName !== 'SCRIPT') {
let html;
if (node.nodeType === 3) { // text node
html = node.nodeValue;
} else { // element
html = node.innerHTML; // or .innerText if you don't want the HTML
}
let match;
// search the html for matches until it can't find any more
while ((match = rxFindTags.exec(html)) !== null) {
// the [1] is to get the first capture group, which contains
// the text we want
matches.push(match[1]);
}
}
});
return matches;
}
console.log('All the matches in the body:', getMyTags(document.body));
console.log('Just in header:', getMyTags(document.getElementById('title')));
<h1 id="title"><b>{~Foo~}</b>{~bar~}</h1>
Some text that is {~not inside of an element~}
<!-- This {~comment~} should not be captured -->
<script>
// this {~script~} should not be captured
</script>
<p>Something {~after~} the stuff that shouldn't be captured</p>
The regular expression /{~(.*?)~}/g
works like this:
{~
start our match at{~
(.*?)
capture anything after it; the?
makes it "non-greedy" (also known as "lazy") so, if you have two instances of{~something~}
in any of the strings we are searching it captures each individually instead of capturing from the first{~
to the last~}
in the string.~}
says there has to be a~}
after our match.
The g
option makes it a 'global' search, meaning it will look for all matches in the string, not just the first one.
Further reading
- childNodes
- nodeType
- Regular-Expressions.info has a great regular expression tutorial.
- MDN RegExp documentation
Tools
There are lots of different tools out there to help you develop regular expressions. Here are a couple I've used:

Useless Code
- 12,123
- 5
- 35
- 40
-
The one downside to this approach over just grabbing the `.innerHTML` of the body is that it won't capture something that has the `{~` and `~}` spread across in different nodes or elements. – Useless Code Apr 03 '18 at 05:10