0

Lets say I have the following string:

var string = "<td>123</td><td>asd</td>";

I want to take the values of the td's and put them in an array. I tried using the foreach function but my regex stops after the first closing td and gets everything between < and >.

var regex = '<([^<> ]*)([^<>]*)?>([^>]*)<\/([^<>]*)>';
var string = "<td>123</td><td>asd</td>";
var result = string.match(regex);

result.forEach(function($var){
    console.log($var);
});

Output:

<td>123</td>
td
undefined
123
td 

I need to manipulate the values so I can work directly in the foreach function without first splitting to an array.

Can I make this work with a regex? I can't use jQuery or append the string to the html.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274

5 Answers5

1

Using regex alone to parse DOM is a no-no. However..

If you don't have nested <td> you can use the following code to get an array of values:

var string = "<td>123</td><td>asd</td>";
var tds = string.split("</td>");
var values = [];

for(var i=0; i<tds.length-1; i++){ //last element is empty if you split like this
    values.push(tds[i].substr(4)); //ommit the "<td>"
}

alert(values);

More complex structures could be a problem and I would advise you to break the TDs up to separate ones and then extract the values using regex (/<td>(.*)</td>/g and select group 1). But for this example it works fine.

jsFiddle

Dropout
  • 13,653
  • 10
  • 56
  • 109
  • works great but I want to give the string as a whole row meaning ex. "value" and that way the first value is value – Иван Божков Jul 29 '14 at 11:55
  • then just get rid of the tr wrapper, either by taking out a substring or by regex – Dropout Jul 29 '14 at 13:27
  • of course not.. it's substr(4).. this is for empty tds.. doesn't matter though, this approach is wrong all together - you shouldn't parse DOM with regexes an substrings as I mentioned because of things like this.. – Dropout Jul 30 '14 at 06:27
0

You could try the below code,

> var re = /[^<>]+(?=<\/)/g;
undefined
> var result = string.match(re);
undefined
> result.forEach(function($var){
... console.log($var);
... });
123
asd
> console.log(result);
[ '123', 'asd' ]

Explanation:

  • [^<>]+ Matches any character not of < or > one or more times.
  • (?=<\/) Lookahead asserts that anything following must be </
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

Split the string with any of <td> OR </td> and reject the "".

This will work for you

var string = "<td>123</td><td>asd</td>";
var contents = string.split(/<td>|<\/td>/);
contents = contents.filter(function(el){return el != ""})

console.log(contents) //["123","asd"]
Mritunjay
  • 25,338
  • 7
  • 55
  • 68
0

Do not parse HTML using RegExp!

Here is a jQuery version for your problem:

$("<tr/>").append("<td>123</td><td>asd</td>") // build a row with the cells
.find("td") // get the cells
.map(function() { return $(this).text(); }); // for each cell, get the content

Result: ["123", "asd"]

Edit: I see you can't use jQuery, that's unfortunate because you really need a DOM parser, and jQuery is just elegant and can do much more.

Community
  • 1
  • 1
Silviu Burcea
  • 5,103
  • 1
  • 29
  • 43
0

Avoid parsing HTML/XML with regex!

I figured out a plain way with JavaScript to do it:

function extractValues(code)
{
    var tr = document.createElement('tr');
    tr.innerHTML = code;
    var tds = values.getElementsByTagName('td');
    var values = [];
    for (var i = 0; i < tds.length; i++) {
        values.push(tds[i].innerHTML);
    }

    return values;
}

console.log(extractValues("<td>123</td><td>asd</td>"));

If you realy realy want a regex, use this:

/<td>((?:.(?!<\/td>))*.?)<\/td>/g
bukart
  • 4,906
  • 2
  • 21
  • 40