0

I want to remove all children from a div node and for each paragraph child removed, I want add them to a new paragraph element so that all paragraphs are merged into one and anything else is removed.

However, while removing all paragraphs and adding them to this new paragraph, I also want to remove all images (and not just by removing the opening and closing arrows) and make sure the format is kept the same for all words (just 11pt standard font).

I hope that makes sense. Here is what I have so far:

var newP = document.createElement('P');
var paras = comment.getElementsByTagName('P');          
var counter = 0;
while (paras[0]) {
var next = paras[0]; // needs to be 0;
if (next.innerHTML.length > 8) { // to safely ignore  
    counter++;              
    if (counter < 3)
        newP.innerHTML += next.innerHTML + "<br /><br />";
}

comment.removeChild(next); // comment is the div which holds the new para
next = paras[0];

if (!next)
    newP.innerHTML += "<a href='" + link + "'>click to view full entry</a>";                
}
comment.appendChild(newP);

Here is a sample of the HTML Mark up which needs to be formatted:

<div class="comment">
    <h3>Heading</h3>
    <p style="font-weight: bold">This has been formatted which should be removed. <span style="font-size: 14pt; color: red;"> This also includes all span tags!</span></p>
    <p>This is a para <a href="">with an anchro</a></p>
    <div> this is a div with an image <img src="//placehold.it/64X64" /></div>
    <p>This is an image in the middle of a paragraph <img src="http://www.thinkstockphotos.com.au/CMS/StaticContent/Hero/TS_AnonHP_462882495_01.jpg"/> which I want to remove, and not just the arrows</p>
</div>

And I want it to look like this:

<div class="comment">
    <p>Heading</p>
    <p>This has been formatted which should be removed. This also includes all span tags!</p>
    <p>This is a para with an anchro</p>
    <div> this is a div with an image</div>
    <p>This is an image in the middle of a paragraph which I want to remove, and not just the arrows</p>
</div>

EDIT: This is not a duplicate as I want to completely remove all images and all attributes/text between the image opening and closing tags (not just remove the arrows that make up the tag) and remove all formatting!

Mayron
  • 2,146
  • 4
  • 25
  • 51
  • This question was marked as a duplicate but the question it is linked to does NOT solve my problem. .replace(/<[^>]*>/g, "") doesn't seem to be working. – Mayron Jul 15 '15 at 13:59
  • Also I want all the text between the image tabs to be removed so should be completely removed, not just the arrows etc.. – Mayron Jul 15 '15 at 14:03
  • 1
    can you share a markup sample - possibly create a [fiddle](http://jsfiddle.net/) – Arun P Johny Jul 15 '15 at 14:04
  • Sure. Basically a form allows users to submit articles they have found. Many are simple wikipedia entries like this: http://codeviewer.org/view/code:5364 – Mayron Jul 15 '15 at 14:06
  • I'll create a fiddle now thank you. But hope the codeviewer link was fine – Mayron Jul 15 '15 at 14:08
  • So you want to remove all the child elements of parent and replace themm with `p` elements and remove all other elemenents within it? what should happen if there is a anchor node... do you want to copy only the text content of the node – Arun P Johny Jul 15 '15 at 14:19
  • 1
    whether something like http://jsfiddle.net/arunpjohny/qc0an4ne/1/ illustrate the problem – Arun P Johny Jul 15 '15 at 14:21
  • I want to remove all the children from a div and any paragraphs removed I want to add all the text from each into just 1 new paragraph and add that to the parent node. Also any images found inside the paragraph should by removed. I would prefer if the format for all words in the new paragraph would be standard 11pt font but not too important. – Mayron Jul 15 '15 at 14:22
  • Yes that illustration helps. I added my own http://jsfiddle.net/MayronEU/6qgavj7f/ where an image can show up in a paragraph which I need to remove and not just by replacing with "" because attribute keywords etc would still show up which makes the paragraph unreadable. – Mayron Jul 15 '15 at 14:26
  • 1
    http://jsfiddle.net/arunpjohny/qc0an4ne/3/ - not the best way.... – Arun P Johny Jul 15 '15 at 14:41
  • Thank you so much! It's certainly better than anything I can do :) This is exactly what I was hoping for! I would tick this as the best answer but can't with comments.. Shame it was incorrectly marked as a duplicate question as well. – Mayron Jul 15 '15 at 14:53
  • 1
    I'll vote to reopen if you provide sample html that you have and the expected result below it. As it is, your question is not entirely clear. (Move your comments into the question, as well.) For example, this doesn't make sense: `I want to completely remove all text between an image tag` – ps2goat Jul 15 '15 at 15:14
  • @ps2goat thanks. I tried rewording the question but its sort of difficult to explain. I hope that looks better now! – Mayron Jul 15 '15 at 15:34
  • @Mayron, you can link to fiddles, but you should still copy the code (styles, html, js, etc) to SO. The reasoning is that the code will remain on SO even if external websites stop functioning. SO even allows adding html/js/css snippets to your questions and answers, which is essentially what jsfiddle does. – ps2goat Jul 15 '15 at 15:42

1 Answers1

1

alert(document.body.outerHTML);

var imgs = Array.prototype.slice.call(document.querySelectorAll("img"));

for (var q=0; q<imgs.length; ++q) {
  imgs[q].parentNode.removeChild(imgs[q]);
}

var ps = Array.prototype.slice.call(document.querySelectorAll("p"));

for (var q=0; q<ps.length; ++q) {
  var p = ps[q];
  p.textContent = p.textContent;
  p.setAttribute("style", "");
}

alert(document.body.outerHTML);
<div class="comment">
    <h3>Heading</h3>
    <p style="font-weight: bold">This has been formatted which should be removed. <span style="font-size: 14pt; color: red;"> This also includes all span tags!</span></p>
    <p>This is a para <a href="">with an anchro</a></p>
    <div> this is a div with an image <img src="//placehold.it/64X64" /></div>
    <p>This is an image in the middle of a paragraph <img src="http://www.thinkstockphotos.com.au/CMS/StaticContent/Hero/TS_AnonHP_462882495_01.jpg"/> which I want to remove, and not just the arrows</p>
</div>
Qwertiy
  • 19,681
  • 15
  • 61
  • 128