Currently I am working on an application that needs to extract the innerHTML of Body and then take the text out of it in a JSON. That JSON will be used for translation and then the translated JSON will be used as input to create the same HTML markup but with translated text. Please see the snippet below.
HTML Input
<section>Hello, <div>This is some text which I need to extract.<a class="link">It can be <strong> complicated.</strong></a></div><span>The extracted text should contain the html tag if it has any html tag in the span,p or a tag</span><p>Please see the <span>desired output below.</span></p>Thanks!</section>';
Translation JSON Output
{
"text1":"Hello, ",
"text2":"This is some text which I need to extract.",
"text3":"It can be <strong> complicated.</strong>",
"text4":"The extracted text should contain the html tag if it
has any html tag in the span,p or a tag",
"text5":"Please see the <span>desired output below.</span>",
"text6":"Thanks!"
}
Translated JSON Input
{
"text1":"Hello,-in spanish ",
"text2":"This is some text which I need to extract.-in spanish",
"text3":"It can be <strong> complicated.-in spanish</strong>",
"text4":"The extracted text should contain the html tag if it
has any html tag in the span,p or a tag-in spanish",
"text5":"Please see the <span>desired output below.-in spanish</span>",
"text6":"Thanks!-in spanish"
}
Translated HTML Output
<section>Hello,-in spanish <div>This is some text which I need to extract.-in spanish<a class="link">It can be <strong> complicated.-in spanish</strong></a></div><span>The extracted text should contain the html tag if it has any html tag in the span,p or a tag-in spanish</span><p>Please see the <span>desired output below.</span></p>Thanks!-in spanish</section>';
I tried various regex but below is the one of the flows I ended up doing but I am not able to achieve the desired output with this.
//encode
const bodyHTML = '<a class="test">hello world<strong> this is gonna be hard</strong></a>';
//replace the quotes with escape quotes
const htmlContent = bodyHTML.replace(/"/g, '\\"');
let count = 0;
let translationObj = {};
let newHtml = htmlContent.replace(/\>(.*?)\</g, function(match) {
//remove the special character
match = match.replace(/\>|\</g, '');
count = count + 1;
translationObj[count] = match;
return '>~' + count + '~<';
});
const translationJSON = '{"1":"hello world in spanish","2":" this is gonna be hard in spanish","3":""}';
//decode
let trasnaltedHtml = '';
const translatedObj = JSON.parse(translationJSON)
trasnaltedHtml = newHtml.replace(/\~(.*?)\~/g, function(match) {
//remove the special character
match = match.replace(/\~|\~/g, '');
return translatedObj[match];
});
//replace the escape quotes with quotes
trasnaltedHtml = trasnaltedHtml.replace(/\\"/g, '"');
//console.log()
console.log("bodyHTML", bodyHTML);
console.log('tranlationObj', translationObj);
console.log("translationJSON", translationJSON);
console.log('newHtml', newHtml);
console.log("trasnaltedHtml", trasnaltedHtml);
I am looking for a working regex or any other approach in JS world that would get the desired result. I wanna get all the text inside HTML in the form of JSON. Another condition is not to split the text if they have some inner html tags so that we don't loose the context of the sentence like
<p>Click <a>here</a></p>
it should be considered as one text "Click <a>here</a>"
. I hope I clarified all the doubts
Thanks in advance !
Click here
` to become `"Click here"`. – Ivan May 23 '18 at 16:26