0

Can someone please help me to adapt this regex : something is not working and I don't see what and why. And I also need to extract one part of the string but don't know how.

This is the end result I wish to have :

**Match[4]** 6ГБ
**Match[5]** 11
**Match[6]** 0
**Match[7]** 10.08.2013 в 22:29
**Match[8]** class="r1"

      var re = /<img src="([\s\S]*?)"[\s\S]*?<td class="nam"><a href="([\s\S]*?)"[\s\S]*?>([\s\S]*?)<\/a><td class='s'>[\s\S]*?<td class='s'>([\s\S]*?)<\/td><td class='sl_s'>\d<\/td><td class='sl_p'>\d<\/td>/g;   

regex for this is missing : class="r1". Number here could change, r0,r1,r2,r3... and for the date/time as wel : Match[7] - 10.08.2013 в 22:29

String where I will search in:

        <div class='pad0x0x5x0 center'><a href='/pay.php' title='Благодарим за поддержку!'><img src='https://jnka-dot-com-st.appspot.com/pic/banners/pay_bn2.png' style='display: inline-block;'></a></div>
        <div class="bx2_0"><table class="t_peer w100p" cellspacing=0 cellpadding=0><tr class="mn"><td class="z w90"></td><td class="w90p"></td><td class="z">Комм.</td><td class="z">Размер</td><td class="z">Сидов</td><td class="z">Пиров</td><td class="z">Залит</td><td class='zl'>Раздает</td></tr>
        <tr class='first bg'><td class="bt"><img src="https://jnka-dot-com-st.appspot.com/pic/cat/13.gif" class="p90x32 pointer" onclick="cat(13);" alt=""></td><td class="nam"><a href="/details.php?kU3lM9Nn1&amp;id=1014968" class="r0">Чужой 4: Воскрешение (Режиссёрская версия) / Alien: Resurrection / 1997 / ДБ, СТ / BDRip</a><td class='s'>2</td>
        <td class='s'>2.18 ГБ</td>
        <td class='sl_s'>13</td>
        <td class='sl_p'>1</td>
        <td class='s'>07.10.2012 в 17:30</td>
        <td class='sl'><a href='/userdetails.php?id=77946' class=u5>vladislav75</a><i class="i1 s9-10"></i></td></tr>
        <tr class=bg><td class="bt"><img src="https://jnka-dot-com-st.appspot.com/pic/cat/18.gif" class="p90x32 pointer" onclick="cat(18);" alt=""></td><td class="nam"><a href="/details.php?kU3lM9Nn1&amp;id=1106257" class="r1">Древние пришельцы (4 сезон: 1-10 серия из 10) / Ancient Aliens / 2012 / ДБ / SATRip</a><td class='s'>10</td>
        <td class='s'>6 ГБ</td>
        <td class='sl_s'>11</td>
        <td class='sl_p'>0</td>
        <td class='s'>10.08.2013 в 22:29</td>

Full code :

  function scraper_search(page, url, title, paginator) {
      page.entries = 0;
      var fromPage = 0,
          tryToSearch = true;
      // 1-icon, 2-filelink, 3-title, 4-size, 5-seeds, 6-peers , 7-time uploaded, 8-class r"x".
      var re = /<img src="([\s\S]*?)"[\s\S]*?<td class="nam"><a href="([\s\S]*?)"[\s\S]*?>([\s\S]*?)<\/a><td class='s'>[\s\S]*?<td class='s'>([\s\S]*?)<\/td><td class='sl_s'>\d<\/td><td class='sl_p'>\d<\/td>/g;

      function loader() {
          if (!tryToSearch) return false;

          page.loading = true;
          var doc;

          if (paginator == '1')
              doc = showtime.httpReq(unescape(url) + fromPage).toString();
          else
              doc = showtime.httpReq(unescape(url)).toString();

          //console.log(doc);
          page.loading = false;
          var doc1 = doc.match(/<table class="t_peer w100p"[\s\S]*?<\/table>/);
          var match = re.exec(doc1);
          while (match) {
              page.appendItem(plugin.getDescriptor().id + "3" + ':index:' + escape(checkUrl(match[2])), 'video', {
                  title: new showtime.RichText(match[3] + ' ' + colorStr(match[4], orange) + '' + match[5] + '/' + match[6]),
                  icon: checkUrl(match[1])
              });

              page.entries++;
              match = re.exec(doc1);
          };

          if (!doc.match(/<table class="t_peer w100p"[\s\S]*?<a href="/)) return tryToSearch = false;

          fromPage++;
          //showtime.print(fromPage);
          loader();
      };

      loader();
      //page.paginator = loader;

      if (page.entries == 0)
          page.error("По заданному запросу ничего не найдено");

      page.loading = false;

      if (page.entries > 50)
          return false;

      page.loading = false;
  }

And maybe you could help me adapt this while loop to stop after 50 results.

Something like this:

while (match && page.entries<50)

but it is not working this way....

Thank you in advance for your help and assistance

KARTHIKEYAN.A
  • 18,210
  • 6
  • 124
  • 133
fil brinza
  • 69
  • 8
  • Instead of [messing with regular expression](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454), just parse the string as HTML and get the relevant elements with `.querySelectorAll()` or `.getElementsByTagName()`. – Andreas Jan 02 '18 at 12:02
  • I would need to rewrite the whole script. The code I pasted is just an 10% of the whole script.... – fil brinza Jan 02 '18 at 12:11

0 Answers0