0

I have a markup string for example.

 var text = '<div>\frac{5}{6}</div>'

And i want to get the text in between the div tag with this

var inBetween = text.replace(/<div>(.*?)<\/div>/g,'$1');
console.log(inBetween);

But this outputs rac{5}{6}. Any help on how to undo this.

Fodder
  • 11
  • 3
  • 1
    The string is ok. It just outputs it with the `\f` as some sort of hidden char. If you care for it, you can `JSON.stringify` first – IT goldman Jul 30 '22 at 21:53
  • `String.raw` maybe? – Konrad Jul 30 '22 at 21:54
  • 2
    `\f` is a form feed. Which is preserved using your regex but logs as... a form feed. see: [Character_Classes: Types](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions/Character_Classes#types) – pilchard Jul 30 '22 at 21:56
  • try `console.log(text)`, you'll see it has nothing to do with regex. You must escape backslashes in string if you want to preserve them, otherwise javascript treats it as escape character – vanowm Jul 30 '22 at 22:02
  • @pilchard you're right, I changed the letter f to something else and it works. Welp I'd find a workaround then. Thanks – Fodder Jul 30 '22 at 22:03
  • 1
    Note that this is only an issue for string literals in the code. If you're getting the data from an API or the DOM, escape sequences aren't processed. – Barmar Jul 30 '22 at 22:04
  • You're trying to parse HTML, so why are you using regex for this at all? The "don't do this" post on that is arguably [the most famous post on Stackoverflow](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Just turn the code into a DOM, and then query that like you query any DOM? – Mike 'Pomax' Kamermans Jul 30 '22 at 22:18

2 Answers2

0

Don't use regular expression to parse HTML, use an HTML parser to parse HTML. Your browser already has one built in for you to use:

let code = `<div>\\frac{5}{6}</div>`;
let doc = new DOMParser().parseFromString(code, `text/html`)
let content = doc.querySelector(`div`).textContent

But of course, note that your string is missing a \:

  • "\\f" in a string declaration is a slash, and then the letter f
  • "\f" in a string declaration is the FORM FEED control code (\u000c)

If your string came "from somewhere" then make sure to properly escape your content before you start to work with it. For example, if this is user input and you composed it, like:

let text = `<div>${input.value}</div>`;

then: make sure to escape that value before you template it in.

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
0

Javascript converts escaped characters into special characters, therefore literal \ will be lost. If you need preserve it, either escape the escape character as \\ or convert special characters back into string:

const unchar = ((dict={"\b":"\\b","\f":"\\f","\n":"\\n","\r":"\\r","\t":"\\t","\v":"\\v"})=>text=>text.replace(/[\b\f\n\r\t\v]/g,c=>dict[c]))();

var text = `<div>\frac{5}{6}</div>`;
var inBetween = text.replace(/<div>(.*?)<\/div>/g,'$1');

console.log(text);
console.log(inBetween);
console.log(unchar(text));
console.log(unchar(inBetween));
vanowm
  • 9,466
  • 2
  • 21
  • 37
  • But again: don't use regex to parse HTML, because regex can't parse HTML grammar. It can only "hopefully get it right", so use `DOMParser`. That's explicitly what it's for. – Mike 'Pomax' Kamermans Jul 30 '22 at 22:30
  • @Mike'Pomax'Kamermans You are correct, but that's not what the question is about...Even with the DOMParser you won't get the `\\` in the string. – vanowm Jul 30 '22 at 22:32
  • No, but just because that's not the core of the question, that doesn't mean not correcting _another_ problem identified in the post =) (kinda like still using `var`. If we're showing modern `const`, better to use `let` as well, with a small note as to why, so that the person asking the question can grow as JS dev) – Mike 'Pomax' Kamermans Jul 30 '22 at 22:52