Getting the innerhtml of td from string [JS]

Question

Lets say I have the following string:

var string = "<td>123</td><td>asd</td>";

I want to take the values of the td's and put them in an array. I tried using the foreach function but my regex stops after the first closing td and gets everything between < and >.

var regex = '<([^<> ]*)([^<>]*)?>([^>]*)<\/([^<>]*)>';
var string = "<td>123</td><td>asd</td>";
var result = string.match(regex);

result.forEach(function($var){
    console.log($var);
});

Output:

<td>123</td>
td
undefined
123
td

I need to manipulate the values so I can work directly in the foreach function without first splitting to an array.

Can I make this work with a regex? I can't use jQuery or append the string to the html.

Your `regex` variable just contains a string, not a `RegExp` object. You need to delimit it with `//` or use `new RegExp` to create a RegExp — Barmar, Jul 29 '14 at 11:18
var str = str.replace('','') and then var res = str.split("");? — dwana, Jul 29 '14 at 11:18
@dwana I'm not sure if my td's will have classess or styles so I can't be sure replace will take all the right results — Иван Божков, Jul 29 '14 at 11:19
it's best not to use RegExp to process HTML. You should use an HTML DOM parser. — Barmar, Jul 29 '14 at 11:24

score 1 · Accepted Answer · answered Jul 29 '14 at 11:32

1

Using regex alone to parse DOM is a no-no. However..

If you don't have nested <td> you can use the following code to get an array of values:

var string = "<td>123</td><td>asd</td>";
var tds = string.split("</td>");
var values = [];

for(var i=0; i<tds.length-1; i++){ //last element is empty if you split like this
    values.push(tds[i].substr(4)); //ommit the "<td>"
}

alert(values);

More complex structures could be a problem and I would advise you to break the TDs up to separate ones and then extract the values using regex (/<td>(.*)</td>/g and select group 1). But for this example it works fine.

jsFiddle

answered Jul 29 '14 at 11:32

Dropout

13,653
10
56
109

works great but I want to give the string as a whole row meaning ex. "value" and that way the first value is value – Иван Божков Jul 29 '14 at 11:55
then just get rid of the tr wrapper, either by taking out a substring or by regex – Dropout Jul 29 '14 at 13:27
of course not.. it's substr(4).. this is for empty tds.. doesn't matter though, this approach is wrong all together - you shouldn't parse DOM with regexes an substrings as I mentioned because of things like this.. – Dropout Jul 30 '14 at 06:27

Avinash Raj · Answer 2 · 2014-07-29T11:30:37.173

0

You could try the below code,

> var re = /[^<>]+(?=<\/)/g;
undefined
> var result = string.match(re);
undefined
> result.forEach(function($var){
... console.log($var);
... });
123
asd
> console.log(result);
[ '123', 'asd' ]

Explanation:

[^<>]+ Matches any character not of < or > one or more times.
(?=<\/) Lookahead asserts that anything following must be </

edited Jul 29 '14 at 11:30

answered Jul 29 '14 at 11:22

Avinash Raj

172,303
28
230
274

score 0 · Answer 3 · answered Jul 29 '14 at 11:26

Split the string with any of <td> OR </td> and reject the "".

This will work for you

var string = "<td>123</td><td>asd</td>";
var contents = string.split(/<td>|<\/td>/);
contents = contents.filter(function(el){return el != ""})

console.log(contents) //["123","asd"]

score 0 · Answer 4 · edited May 23 '17 at 12:20

Do not parse HTML using RegExp!

Here is a jQuery version for your problem:

$("<tr/>").append("<td>123</td><td>asd</td>") // build a row with the cells
.find("td") // get the cells
.map(function() { return $(this).text(); }); // for each cell, get the content

Result: ["123", "asd"]

Edit: I see you can't use jQuery, that's unfortunate because you really need a DOM parser, and jQuery is just elegant and can do much more.

score 0 · Answer 5 · answered Jul 29 '14 at 12:31

Avoid parsing HTML/XML with regex!

I figured out a plain way with JavaScript to do it:

function extractValues(code)
{
    var tr = document.createElement('tr');
    tr.innerHTML = code;
    var tds = values.getElementsByTagName('td');
    var values = [];
    for (var i = 0; i < tds.length; i++) {
        values.push(tds[i].innerHTML);
    }

    return values;
}

console.log(extractValues("<td>123</td><td>asd</td>"));

If you realy realy want a regex, use this:

/<td>((?:.(?!<\/td>))*.?)<\/td>/g

Getting the innerhtml of td from string [JS]

5 Answers5