0

Trying to grab the magnet link from the following code

rawdata = ''' <div class="iaconbox center floatright">
            <a rel="12624681,0" class="icommentjs kaButton smallButton rightButton" href="https://kat.cr/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681.html#comment">209 <i class="ka ka-comment"></i></a>               <a class="icon16" href="https://kat.cr/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681.html" title="Verified Torrent"><i class="ka ka16 ka-verify ka-green"></i></a>                                <div data-sc-replace="" data-sc-slot="_ae58c272c09a10c792c6b17d55c20208" class="none" data-sc-params="{ &#39;name&#39;: &#39;Zootopia%202016%201080p%20HDRip%20x264%20AC3-JYK&#39;, &#39;extension&#39;: &#39;mkv&#39;, &#39;magnet&#39;: &#39;magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&amp;dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&amp;tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&amp;tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&#39; }"></div>
            <a data-nop="" title="Torrent magnet link" href="magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&amp;dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&amp;tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&amp;tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce" class="icon16 askFeedbackjs" data-id="CE8357DED670F06329F6028D2F2CEA6F514646E0"><i class="ka ka16 ka-magnet"></i></a>
            <a data-download="" title="Download torrent file" href="https://kat.cr/torrents/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681/" class="icon16 askFeedbackjs"><i class="ka ka16 ka-arrow-down"></i></a>
        </div> '''

Using this command

rawdata[rawdata.find("<")+1:rawdata.find(">")]

Gives me

div class="iaconbox center floatright"

But when I try to find Magnet link

rawdata[rawdata.find("href="magnet:?")+1:rawdata.find(""")]

It gives me

' '

What I actually want it to give me

magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce

It's so easy with Shell, but it has to be done with Python itself.

SamFlynn
  • 369
  • 7
  • 20

4 Answers4

1

try rawdata[rawdata.find('href="magnet:?')+1:rawdata.find('"')]

HenryM
  • 5,557
  • 7
  • 49
  • 105
1

It's better to use regular expression.

import re

rawdata = '''your rawdata......'''
regex = re.compile('href="(.+)" class="icon16')
magnet_href = regex.search(rawdata).group(1)
1

First of all, as pointed out by HenryM, you need to use single quotes or escape the " to make the strings valid.

Second, find() always returns the first index of the character found. So you will find the first " and not the one ending the link. To fix this use the beg parameter to define the beginning of your search.

Additionally, you need to add the length of your query to the start index, as find gives you the starting index of the match, not the end you are looking for. The code would look something like this (completely untested):

start = rawdata.find('href="magnet:?') + 14
end = rawdata.find('"', beg=start)
link = rawdata[start:end]
Leon
  • 2,926
  • 1
  • 25
  • 34
1

The input data is an HTML fragment. You should not be using regular expressions to parse it.

Use a parser instead. Here is a working sample using BeautifulSoup HTML parser:

from bs4 import BeautifulSoup


rawdata = ''' <div class="iaconbox center floatright">
    <a rel="12624681,0" class="icommentjs kaButton smallButton rightButton" href="https://kat.cr/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681.html#comment">209 <i class="ka ka-comment"></i></a>               <a class="icon16" href="https://kat.cr/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681.html" title="Verified Torrent"><i class="ka ka16 ka-verify ka-green"></i></a>                                <div data-sc-replace="" data-sc-slot="_ae58c272c09a10c792c6b17d55c20208" class="none" data-sc-params="{ &#39;name&#39;: &#39;Zootopia%202016%201080p%20HDRip%20x264%20AC3-JYK&#39;, &#39;extension&#39;: &#39;mkv&#39;, &#39;magnet&#39;: &#39;magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&amp;dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&amp;tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&amp;tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&#39; }"></div>
    <a data-nop="" title="Torrent magnet link" href="magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&amp;dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&amp;tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&amp;tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce" class="icon16 askFeedbackjs" data-id="CE8357DED670F06329F6028D2F2CEA6F514646E0"><i class="ka ka16 ka-magnet"></i></a>
    <a data-download="" title="Download torrent file" href="https://kat.cr/torrents/zootopia-2016-1080p-hdrip-x264-ac3-jyk-t12624681/" class="icon16 askFeedbackjs"><i class="ka ka16 ka-arrow-down"></i></a>
</div> '''

soup = BeautifulSoup(rawdata, "html.parser")
print(soup.find("a", title="Torrent magnet link")["href"])

Prints:

magnet:?xt=urn:btih:CE8357DED670F06329F6028D2F2CEA6F514646E0&dn=zootopia+2016+1080p+hdrip+x264+ac3+jyk&tr=udp%3A%2F%2Ftracker.publicbt.com%2Fannounce&tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • If I were to find multiple instances of the Magnet link, will findall() do the job here? – SamFlynn Jun 12 '16 at 17:18
  • @SamFlynn yeas, sure, use the `find_all()` method and get the `href` attribute for every element found in the loop. – alecxe Jun 12 '16 at 17:27