1

I am using fetch to get an HTML file. So far I've only figured out how to get the response back as an array of text, using the text() method:

fetch(url, {
    credentials: 'same-origin'})
    .then(function(response) {
    return response.text();
}).then(function(text) {
    longAssText = text;
    textExtract = longAssText.match(/<table class='listing' id='customer-tickets'>[\s\S]*<script type='text\/javascript'>/gi);
});

The string I get back looks something like this (textExtract):

<span class="status status_active">active</span></td>
<td><a href="/tickets/365347-SOME-TITLE">#365347 SOME-TITLE</a></td>
<td>2018-03-12 09:14:34</td>
<td>2018-03-12 10:12:46</td>
<td>some category</td>
</tr>
<tr class='even'>
<td>
<img align="absmiddle" alt="Service_request_ticket" src="/images/service_request_ticket.gif?1520519528" title="some attribute" />
<img align="absmiddle" alt="Number_1" src="/images/number_1.gif?1520519528" title="Saken ligger hos 1. linje" />
<img align="absmiddle" alt="Flag_disabled" src="/images/flag_disabled.png?1520519528" title="Priority: Normal" />
</td>
<td class='ttstatus'><span class="status status_closed">closed</span></td>
<td><a href="/tickets/150640-vs-sender-e-post-brn001ba9bd7a93_000186">#150640 VS: SOME TITLE</a></td>
<td>2013-11-06 08:12:35</td>
<td>2013-11-20 21:00:11</td>
<td>Some category</td>
</tr>
<tr class='odd'>
<td>

I want to extract the text inside every a-tag prepended with the status_active class: "#365347 SOME-TITLE".

So in:

<a href="/tickets/365347-SOME-TITLE">#365347 SOME-TITLE</a>

I want to extract #365347 SOME-TITLE.

..every a-tag after a span.status_active.

I'm having a hard time with regex. I was thinking of getting all instances with regex, but I cant even get the first match.

I've tried patterns like this from([\s\S]*?)to but I'm really having a hard time wrapping my head around this.

The closest I've managed is:

(status_active)[^._]*(?=\.)

But not every text has a . at the end..

Is regex the way to go? If so could someone point me in the right direction?

2 Answers2

0

Regex is not the way to go.

Please use an html parser (for example DomParser):

parser = new DOMParser();
htmlDoc = parser.parseFromString(text, "text/html");
...

See also this famous SO answer... :-)

MarcoS
  • 17,323
  • 24
  • 96
  • 174
0

Try this one:

var regex = /status_active.*?\n*.*<a.*?>(.*?)<\/a>/gm
var matches = text.match(regex);
console.log(matches);

Another approach could be to use jQuery to parse the text and to use selectors to find the corresponding nodes. Like MarcoS already stated: This would be a much cleaner solution, since regexes are not the best tool for parsing xml structures.

Paul
  • 2,086
  • 1
  • 8
  • 16