3

HTML Source:

<td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen">&nbsp;&nbsp;301</font> - 4864</td>

I want to get text array in tag td. like this

["2644-3/4", "QPSK", "301 - 4864"]

Which method should be used to be better?

Thanks in advance!

Andy
  • 61,948
  • 13
  • 68
  • 95
T Breeze
  • 45
  • 4

2 Answers2

1

Let's start with:

let td = '<td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen">&nbsp;&nbsp;301</font> - 4864</td>'

How about:

td.split('<br>').map(part => cheerio.load(part).text().trim())
// Array(3) ["2644-3/4", "QPSK", "301 - 4864"]
pguardiario
  • 53,827
  • 19
  • 119
  • 159
0

Your HTML doesn't parse so I think the only way to do this is fix it, and then use a regex to pick out the information:

// The fixed HTML. The td is wrapped in table/tr elements
// Ideally there should be a </font> tag too but Cheerio seems to ignore that 
const html = '<table><tr><td bgcolor="#ffffbb" colspan=2><font face="Verdana" size=1>2644-3/4<br>QPSK<br><font color="darkgreen">&nbsp;&nbsp;301</font> - 4864</td></tr></table>';
const $ = cheerio.load(html);

// Grab the cell
const $td = $('td');

// (\d{4}-\d\/\d) - matches first group
// ([A-Z]{4}) - matches the second group
// (?:.*) - non-capture group
// (\d{3} - \d{4}) - matches the final group
const re = /(\d{4}-\d\/\d)([A-Z]{4})(?:.*)(\d{3} - \d{4})/;

// Match the text against the regex and remove the full match
const arr = $td.text().match(re).slice(1);

// Outputs `["2644-3/4","QPSK","301 - 4864"]`
console.log(arr);
Andy
  • 61,948
  • 13
  • 68
  • 95