1

I'm trying to use regex in javascript to scrape some information from a website. At the moment, I have two separate matching patterns. The first extracts a table ...

/(?:<tbody>)(.*?)(?:<\/tbody>)/

whilst the second splits the returned content into rows...

/(?:<tr>)(.*?)(?:<\/tr>)/g

... now I was wondering if I could do this in one 'move', as it were. I want it to be ...

(?:<tbody>)((?:<tr>)(.*?)(?:<\/tr>))(?:<\/tbody>)/g

but no joy. Of course, I could use the .split method, but this doesn't fit with my broader process - I have an array of regex's which are foreach'd to extract the info I need.

Any help, much appreciated...

Mark Peace
  • 71
  • 2
  • 11

2 Answers2

1

Parse the string containing the HTML, for example, using jQuery:

var html = '<table><tbody><tr><td>row 1</td></tr><tr><td>row 2</td></tr></tbody></table>';    
var dom = $.parseHTML(html);    
var rows = $(dom).find('tr');
console.log(rows);

Output:

Object[tr, tr]

If you want to access the HTML as a string, you can use .outerHTML, for example:

$.each(rows, function (i, el) {
    console.log(el.outerHTML);
});

Output:

<tr><td>row 1</td></tr>
<tr><td>row 2</td></tr>
Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
0

You can do it by using a single regex: /(<tbody>|<\/tbody>|<tr>.*?<\/tr>)/g.

Here it is:

text = '<tbody><tr><td>sample</td></tr><tr><td>sample2</td></tr></tbody>';
exp = /(<tbody>|<\/tbody>|<tr>.*?<\/tr>)/g;

array = text.match(exp);

> ["<tbody>", "<tr><td>sample</td></tr>", "<tr><td>sample2</td></tr>", "</tbody>"]

Then the first element of the array will be <tbody>, the last one will be </tbody>, and any other element will be a line of the table, like this:

> ["<tbody>", line1, line2, line3, ..., line n, "</tbody>"]

So you can do this to obtain only the lines:

text = '<tbody><tr><td>sample</td></tr><tr><td>sample2</td></tr></tbody>';
exp = /(<tr>.*?<\/tr>)/g;

lines = text.match(exp);

> ["<tr><td>sample</td></tr>", "<tr><td>sample2</td></tr>"]
Marco Bonelli
  • 63,369
  • 21
  • 118
  • 128