1

I am using asppdf to create a PDF from HTML.

It looks like your HTML needs to be in a single line with all whitespace removed, before its passed to the ImportFromUrl method, this is an example from the support site:

str = "<HTML><TABLE><TR><TD>Text1</TD><TD>Text2</TD></TR></TABLE></HTML>"
Doc.ImportFromUrl str 

Currently my HTML is pulled in from an external page & it's all formatted, so i need it to be like the above example. Can I use jQuery to do this?

Reference http://www.asppdf.com/manual_13.html#13_5

Leto
  • 503
  • 3
  • 18
kb.
  • 1,010
  • 3
  • 17
  • 34

3 Answers3

3

Use this regular expression for spaces only:

var HTML = "<h1>hh ee</h1>     <h2>heyy  heyyy</h2>";
document.getElementById("after").innerText = HTML.replace(/>[ ]+</g, "><");
document.getElementById("before").innerText = HTML;
<h1 id="before"></h1>
Becomes
<h1 id="after"></h1>
And this for tabs, new lines and spaces:

var HTML = "<h1>hh ee</h1>    <h2>heyy  heyyy</h2>";
document.getElementById("after").innerText = HTML.replace(/>[\n\t ]+</g, "><");
document.getElementById("before").innerText = HTML;
<h1 id="before"></h1>
Becomes
<h1 id="after"></h1>
DividedByZero
  • 4,333
  • 2
  • 19
  • 33
  • Thanks thats the regex I was looking for! – kb. Nov 05 '14 at 12:51
  • not the best solution if you have `test test2` since you suddenly end up with `testtest2` instead of `test test2`. – DoXicK Nov 05 '14 at 12:54
  • @RandomUser http://kangax.github.io/html-minifier/ understands HTML, because it's based on an actual HTML parser. Slightly more heavyweight than your regex though :) – Olly Hodgson Nov 05 '14 at 13:07
  • @RandomUser if i knew an answer, i would've given it. I'm just pointing out a possible flaw :-) – DoXicK Nov 05 '14 at 13:11
  • My HTML is pretty much static & only the innerHTML changes on various elements, so I am happy with the solution from @RandomUser – kb. Nov 05 '14 at 13:13
  • HTML.replace(/>[\n\t ]+<").trim() will remove whitespaces from start and end too. – Assad Nazar Mar 12 '21 at 13:56
3

i use regex "someHtml".replace(/\n\s+|\n/g, ""). It's not perfect but it will keep the content intact an delete most of the unnecessary white spaces.

var dom = document.documentElement.outerHTML;
$("body").text(dom.replace(/\n\s+|\n/g, ""));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div>
  lorem ipsum
</div>



<div>
  more text
</div>   <div>
  test 123
</div>
Robin Knaapen
  • 606
  • 4
  • 12
  • can you also explain what it removes. also i think you should consider adding new line character check from different OS refer this https://stackoverflow.com/questions/10805125/how-to-remove-all-line-breaks-from-a-string – shyam_ Oct 11 '18 at 19:20
0

You can use jquery for this Code:

  str = str.replace(/\s+/g, '');
Deepak
  • 112
  • 10
  • 1
    Thanks but this is replacing all whitespace inside the tags too, like this... – kb. Nov 05 '14 at 12:31
  • Please refer the link: [link](http://stackoverflow.com/questions/1539367/remove-whitespace-and-line-breaks-between-html-elements-using-jquery) – Deepak Nov 05 '14 at 12:36