0

Suppose if I have below code. As you can see, I have some script or data wrapped with "%%[" and "]%%". And normally it is illegal. That's the original data I want to keep. Meanwhile I want to Add/Change/Remove the attributes in the <table>. Then output the code after modified.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
%%[
Sever Language data here
]%%
<title>%%=v(@variable)=%%</title>
</head>
<body>
    <div style="display:none;">
        <custom name="opencounter" type="tracking">
        <img width='0' height='0' src='%%=v(@adometry)=%%'> 
    </div>
    <table width="100%" cellpadding="0" cellspacing="0" border="0" bgcolor="#ffffff">
        <tr>
            <td align="center">Something here
            </td>
        </tr>
    </table>
</body>

I've tried lots of way to work on this. I've tried Beautifulsoup. But it will change some special character like "—" to "&mdash". I want to keep the special character if it's not coded as escaped character. Beautifulsoup also change the order of the attribute. For the <custom> tag, it will convert it to <custom></custom>. I think Beautifulsoup is a lib good at parsing data not manipulating data.

I also tried jsdom long ago, it was working fine I think. But it still have some trouble with <custom> issue. It will have some trouble with changing <img> to <img />. Not sure if jsdom will keep the illegal data. And it's working very slow...

I've also tried to use jQuery in browser to output with .html() function. But it will change the order of the attribute. And for the <table> tag, it will insert <tbody> in it which isn't what I want.

So suppose I want to change the cellpadding to 10. The code should looks like below. Maybe I can allow the different order of the attribute. Does anyone have any idea on what lib I can use or what kind of thing I can do to work on this requirement. Welcome any comments!!! BTW, I'm not sooo familiar with regular expression. I think it will makes me frustrating...

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
%%[
Sever Language data here
]%%
<title>%%=v(@variable)=%%</title>
</head>
<body>
    <div style="display:none;">
        <custom name="opencounter" type="tracking">
        <img width='0' height='0' src='%%=v(@adometry)=%%'> 
    </div>
    <table width="100%" cellpadding="10" cellspacing="0" border="0" bgcolor="#ffffff">
        <tr>
            <td align="center">Something here
            </td>
        </tr>
    </table>
</body>
Woody
  • 19
  • 5
  • 1
    changing the order of attributes is totally ok, because the order is not defined. If you want to keep the raw bytes, use string replacement or regular expressions. – Daniel Jul 19 '14 at 07:35
  • Hi Daniel, the problem is code like `` will be converted as ` ` That's pretty frustrating... – Woody Jul 19 '14 at 09:31
  • 1
    what you observe are different representations of the same html-document-tree. Btw, you are stating, you use xhtml, but your document is no valid xhtml. – Daniel Jul 19 '14 at 10:06

2 Answers2

0

jQuery I believe should do what you want, but not with the .html() function. Keep the table as-is, then use jQuery to select it and modify the attributes.

I am not sure what attributes you want to add/ change/ remove, but code like the following would work:

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>
<script type="text/javascript">
  $(function() {
    var $table = $('table');
    $table.attr('cellpadding', 10);  // modify
    $table.removeAttr('bgcolor');  // remove
    $table.attr('style', 'color: yellow;');  // add
  })
</script>

Of course, you should add a class or id to your table to make it easier to select by jQuery.

Parsers like BeautifulSoup work by parsing the XML into objects that it understands. When it writes out the HTML, it writes out the data in the objects, not the original string that was parsed.

Oleg
  • 1,659
  • 13
  • 23
  • Hey Oleg, thank you for your comments. The question is, when I did the add/change/remove work. I have to export the modified code to a file or pass to somewhere maybe backend. So using jQuery, is there a good way to export the whole html including the raw data and the doctype? Thanks! – Woody Jul 19 '14 at 08:35
  • I see. No, I don't how to do that. Perhaps [using jquery with nodejs](http://stackoverflow.com/questions/1801160/can-i-use-jquery-with-node-js)? But I've never tried that, so it might have the same issues as BeautifulSoup. – Oleg Jul 19 '14 at 08:43
  • Alright, thank you the same. I'll try to see if node can do this. Thanks again! – Woody Jul 19 '14 at 08:52
0

The only answer to your requirements is, to use string manipulation:

text = text.replace('cellpadding="0"', 'cellpadding="10"')
Daniel
  • 42,087
  • 4
  • 55
  • 81
  • Well. Thank you. But the situation is quite complex. It's not only change the cellpadding. So maybe there will be some tag to be removed. But anyway, thank you for your comments. – Woody Jul 19 '14 at 13:23