0

I am trying to extract data from a site but it seems that when I visit the site manually so I get the contents of the site but when I try to visit the site through CURL and want to retrieve that data so instead I get something else and I am not getting the site content at all so what would be causing that to happen?

Here is the site Link as :

The Content which I am getting when used curl is something like this as follows :

<div id="e_content" class="hide">{"ct":"F8WpFx0ojTx8r3KnG\/+3r1awNaWXVxuscAZUk\/UzDKHAKCPDFaTDfBXxQI6jCbvTOkRRdmORMzZGQffGW14m0Z2v53m0kldAyyMrByjAVX0k9UOla+VseAw6b+XRLZ4Jfd5EErjcjw+txG7nIPmGoP\/eC4lIZsNc63Zt5utbylIGF6x+Z+yneF1VqB65f5u\/IIds04HB6CZlpG9Ii5n0XrybFFN6yM4C1gWsf2TlFNQD3GHYeAbadQ3m4AOLc5X1zyOmz6Z3\/RIVu1Xy6KdA+tmLACgZ0iPchD1eAnhVn+2yNGA1aSAjb47FwqU2Tnpzp\/6Ha9ZLoFYTch3fgtLnfRm1ds3CmrIrgTsEMRGxtAJAkDSDTTTdNeS1\/V6UUTq\/si2KHeNv\/qOKWa\/Rmm7TU9kh7q0jQGY4QMBQUx7OQOqxwk6H7BhQd\/wOAT94qiW\/x+ika0V70oKE3\/EMGbngxexMoqpJ1Q8VEGJahCMl2aUg9Za4PmrqjEuTW5bK22tZeovKkOK\/3Tjdc2WFA+yVFLZXmxb89QOv6C1x79Cr4S+OnTebyd7PJxekvo0hqzcT6Le6e3WjUKNp1wBtL9QHCzlHWQmjnvvcKxOdsultuG7TP2I3sWlpROCRUm5kkEaH38kWiSuA6484rRoRbDLHkkdY1Ylt2zV0\/RX2gMmNQP3dSdlkdGZGkaXn6fjVkoSLtLfUiAyhtpc4wHG3zkwyE4aepQDR0xC+PFVOdiGHpKwhXUyUOCtHf9wU65thKjA6lvxW4D335jzCPz7lZ4P\/lrzMl9gU3moaoLQJs7HpN473ZxkgaibYc93YVUkHKHUId1KYxT1IJtMzFbg90drrnYDLxzIi8DYUuW+dmbrpV5WVyHcpng3RjWehetL62PheG5ToHX9Qfz3CpSrnbea\/BkXBML\/+UCR7purvYAdZup\/W8E107TYPHaXTzIbshtgEmJueJccUp7xDa+qE3VxDdsidLoUMXpX5IdPpgvfkgCrJn72ZPqPSq+R+ZrnAYeWWU+iBb31efMhziplzFkmlcLvh6yQwCpa8GrXFkNxuIiQKPJ2J6wPJcXxHuN0d9SfLyotFUe4AhPCPaJ\/QnTXB2UTvdsduM0blswoJbslAH51xuFjR89xho22hrnr08QVc68WhAIQKgru4MvXKsjYvoMC6YvTZCZ7xBVlsOu+snPDTUlNUp4+0twEfmtwaL5MHnV2sNY5\/TAFwmZfgnlAhnGc8+8DytuHHRyX2DIgsH45YBb4tAn0VpzUtZFdwDfJ7Dlwkq9oFoI\/HBhV5DsZ1MIlrx76A2Bqih3FQfjZRkEohQdV6YOPUUO7lRS+kC9Tyd4NI+q4HwQ2oYCMDhStGeOyMYq\/06E9a5MAMlxywQaKiS+FgyXk7sBO9bIGnlVHqjGDG0o7bQDtl6XjnAZ+su9tf5Dl9KkcG5cRfnUn5JTgg18jBkYJBMG5ygiDcfgrY4pPS56whxSJZ6aCjabJ9XvlDXLvIiXHcoBrfwPy9BEF75qXJflmvrNR4KyxOrU5zT3t\/v9RjdFapvxW1ZkQEWQsJJIHCOo+tYckPyIiVuASJzPw9aA22F8rF\/OhvMQ2k0QHy9\/95u+CjEJ8Jr0roF5lVHWxZS3CpzhgeuxPEouhO+CaatthxwQlcGJpff1DdCVkNWiie02Us6s8wq0NC7\/il04gYYcy7Q9b7Qqx4PqmrJRWKoIe7TsMHCYezrTGKKWDTNbPWAor3TnHO7sEqRbjE7p0LM7WF7KWOaxHVPR1uo\/v25fP638eU6+V7K4j4alQY7ut9PlMdQQVe1Wy7E\/xYv85LYXpPvOoaotHHydH6FFqrMq\/6Qe2KAaCvG6rusIYeTUyB379fzgulYQ1H8gkoE6axwiEv9+MqZG1jKLGNV2\/xfJ7ttTm+wUa2jR0R\/UE\/BO1YF7sFHtGey54KArO81\/Z0eksDuIkYSir+IxpP8pCpPewQPmQbD+UcSzr96qOLx\/SwQNVJ5JnQK0ozubBWLlFqktgpk7bIc\/PQ+afpGhFds3+tz4RtkjRmo4borOkLnb\/VdZ1LIE0zqb3YPahmO3UW8Cm3n6wSZgIj0aSZW0iiC4D\/NvPblBuiSYA45wiYx4cgv21boUxF7Q815n26K6+W3Se74EAngsh+qa4VNwJzTz5t78MI9QU\/5W03JLfh\/SUwCPyld60V0hvi610BSLHiadRRSPawp8xTRdXaQYY4wD46R\/xuN0fS22ycfvMI7x9qfdFiRpL+uUELAwS3mY+Xwu1TxhNdVTy1MCLclmLEZpCDwSx5gLFfbPcgTle\/ZuK4ljxpXsMWG1act2q3kP51Up3vwFARMx2VPnp29oMxgFPxyocc2AQLxSTFixtp3rBXlQEdyqHf25NL44jPiXHeXjoSX8w8KuVbSOOZFvUC4qZpssoB51HF7bAlrcUalKKL+DdDZwJjoYIh88YgwsG2MSlUHusVxDSJ76Lwf9oTPLEMU6VFlxcouvXrTNI21Gs8FqGMMVFh+gZcabzIKUM12bY0Uruuzt6MMuAi+zSiXOlc9LRPT581GLsfAc4q9dEhbyQPD\/LEm90nKEOJqmRhb\/gfqdth1eo0v6YHb+8yu7fEY6ZlpWlPA1Jh\/Zu+pPIKjJMQahCmHK883bObE3ahZLgehFQJ7wZCgmSzu9biNdOYGFK6kiwr7aSeJQSUoZdR5AjV2oUrk3WYPJ27wOES2raxnfTMwB1BTzDMGvGLdoD2PyjD5XU68uJimBWj5eVK7Ug1jH\/lv2OmobN7KPz3iXzH\/8RYV7gBVcFOHAKx4dZqyvshH0KOsVvInymD87\/ScG5Nz+\/HW7f1VmvvONq8xs4xjIRyeyj1CNCrM1f1Gffwku\/PM3KI171U+bs+AfSxuET7UXzczmz+8iR17I5T0fVdQRmej+14aC0\/2FU6+5Dx2ypK0KWSmlj5Z2gRT3ft1sxppzlc0IRtiDToXmnxaXqpdrfj+wNITSiBH\/mVTAGeWrfizjkA9hkmPlGZ2xY1PyD3Sjxs3kpwtepic5ucyl6wyTvK2gLho264iuSg+brSEzGuGb20GIVjVGcEAPQ3okbyc3isUXD1Z0UVG\/evse\/42iNko+yooyeyXtE2GzK0rxVgpnwQC\/TbJ7YegMZhj7kK0+OlX772ELCIaESc2pnZsvZTGELQ2nKefvJFATLl+NSJiYH9HF14s+c9mxzAOqpX\/gXHRzXRo2Wd9+Zk6PBN9KYREqBMUXzjNUM1fNnzlqfBUCsSQXMA\/xia+RyY0HG7GMLmguA\/aR8NsoiFhLakANK4Q0HQf+tf5NT3\/rTrMyz+nNsNPyWLoljcQrCWoTBjbkIKA\/+Rz6+z\/RJyghm3PNN2fYl5D4Y0LHKyjV6gAWbrm7Bt15YQs4Qb+tGcZwICvOvpjy1zHGqxl\/wVoKGjlDHSM67VuPV44nfaxNMwG5ar0BoLzkFkzEkhBPYo\/oRdJur67lOw6GVvUNJ8VULBNBz0XUo5CPXjFsEEcV5NB\/ZeUBt8eOcLRY4hv2bYHP3gy2uIRnTbzPw0fdwnkSu16wzbpUJe90jPLCtWQHXQTGF3SZdxxIqMyveWHHkqAORsEIsvyxTzUfgJ77u4xY\/C1qFtTBrTsy3ahh3oYb1cE93PAWvQUT7ycsQ0FUEcFqU6BjVflwcL2UtmeCdzuGKSWrMNhLB+TB5\/vFgk4qP20RQk+EN7YtXPiTu2SA7EqDaqCxwTbY8XoMRv6\/drlxRrMnu8Ogdr2CmNdceWN9DydNENOEKD8v2ne4uCr4wCu82dyvZkNe0jPiiPkDcQabx\/e9TnQlEf0hHlFM7Udee7nEZVuO4iTGPPwXPv+tTj2mOn2d4x0zvfUaSAy3tUNME2ukqrC4htykV+mWLlFlL\/fKDpkv4tnw+Odc2VWnCDvHEFVhePZJH8qfS3dgl17vp3uUqV15uBIABv2FF2CaSu2Hr7SJsret6M4IIGjGuZeVnJBPLMNZiA5DT5tradwKDR0Vkh4iGPssR3PoyMCq2+ZyzGBGy1rtk+6UCW6NjMIVIUGiwNiNNtPOEMf09XFN0+lOxYiyQwTJoH1Ipwq51\/yiuyG+V0Jn+xfUWjcXUaNwQ8QhScMnbcWd\/FMEeNtI1a+fvjJhwyEj206x6jZYX\/","s":"5e8c43f3cf7698f5"}</div>

While the content which I am getting upon manual visit of the site is as follows as :

<div class="hide" id="e_content" style="display: block;">
        <table cellpadding="5" cellspacing="0" style="border: 1px #000 dashed;" width="100%">
            <tbody>
                <tr>
                    <td><span style="padding: 0px 10px;"><a href="/" title="ExtraTorrent.cc - The Biggest Bittorent System">ExtraTorrent.cc</a> &gt; Popular Torrents &gt; <b>All Popular TV Torrents</b></span></td>
                </tr>
            </tbody>
        </table><br>
        <table border="0" cellpadding="0" cellspacing="0" width="100%">
            <tbody>
                <tr>
                    <td>
                        <h1>All Popular TV Torrents (159 torrents) <a href="/rss.xml?type=popular&amp;cid=8" title="RSS: All Popular TV Torrents"><img alt="RSS: All Popular TV Torrents" border="0" height="14" src="//images4et.com/images/rss.gif" width="32"></a></h1>
                    </td>
                    <td align="right">
                        <table border="0" cellpadding="0" cellspacing="0" width="100">
                            <tbody>
                                <tr>
                                    <td align="left" valign="middle">See&nbsp;also:</td>
                                    <td align="left" nowrap="nowrap" style="padding: 0px 5px;" valign="middle">
                                        <a href="/today/" title="Today Torrents">Today Torrents</a><br>
                                        <a href="/yesterday/" title="Yesterday Torrents">Yesterday Torrents</a>
                                    </td>
                                </tr>
                            </tbody>
                        </table>
                    </td>
                </tr>
            </tbody>
        </table><br>
        <table border="0" cellpadding="0" cellspacing="0" width="100%">
            <tbody>
                <tr>
                    <td style="padding: 5px;">
                        <b class="pager_no_link">1</b> <a class="pager_link" href="/view/popular/TV.html?page=2&amp;srt=seeds&amp;order=desc&amp;pp=50" title="2">2</a> <a class="pager_link" href="/view/popular/TV.html?page=3&amp;srt=seeds&amp;order=desc&amp;pp=50" title="3">3</a> <a class="pager_link" href="/view/popular/TV.html?page=4&amp;srt=seeds&amp;order=desc&amp;pp=50" title="4">4</a> <a class="pager_link" href="/view/popular/TV.html?page=2&amp;srt=seeds&amp;order=desc&amp;pp=50" title="2">&gt;</a> <a class="pager_link" href="/view/popular/TV.html?page=4&amp;srt=seeds&amp;order=desc&amp;pp=50" title="4">&gt;&gt;&gt;</a><br>
                    </td>
                    <td align="right" style="padding-right: 10px;">Torrents per page: <select name="torr_cat" onchange="Change(this);">
                        <option value="10">
                            10
                        </option>
                        <option value="25">
                            25
                        </option>
                        <option selected="selected" value="50">
                            50
                        </option>
                        <option value="100">
                            100
                        </option>
                    </select></td>
                </tr>
            </tbody>
        </table>
        <script type="text/javascript">
        function Change(el)
        {
            alert('You have no rights for this action!');
        }
        </script><br>
        <table class="tl">
            <thead>
                <tr>
                    <th colspan="2" width="100%">
                        <table>
                            <tbody>
                                <tr>
                                    <td>
                                        <a class="h2" href="/category/8/TV+Torrents.html" title="Browse TV Torrents"><img align="left" alt="Browse TV Torrents" hspace="10" src="//images4et.com/images/cat/8s.gif">&nbsp;TV torrents</a>&nbsp;
                                    </td>
                                    <td width="60">
                                        <a href="/rss.xml?type=popular&amp;cid=8" title="RSS: TV Torrents"><img alt="RSS" src="//images4et.com/images/rss.gif"></a>&nbsp;
                                    </td>
                                    <td width="18">
                                        <a href="/view/popular/TV.html?page=1&amp;srt=comments&amp;pp=50" title="Sort torrents by comments"><img alt="Sort" src="//images4et.com/images/icon_comments.gif"></a>
                                    </td>
                                </tr>
                            </tbody>
                        </table>
                    </th>
                    <th>
                        <a href="/view/popular/TV.html?page=1&amp;srt=added&amp;pp=50&amp;order=desc" title="Sort torrents by added time">Added</a>&nbsp;
                    </th>
                    <th>
                        <a href="/view/popular/TV.html?page=1&amp;srt=size&amp;pp=50&amp;order=desc" title="Sort torrents by size">Size</a>&nbsp;<a href="/view/popular/TV.html?page=1&amp;srt=size&amp;pp=50&amp;order=desc" title="Sort torrents by size"><img alt="Sort" src="//images4et.com/images/sort.gif"></a>
                    </th>
                    <th>
                        <a href="/view/popular/TV.html?page=1&amp;srt=seeds&amp;pp=50&amp;order=asc" title="Sort torrents by seeds">S</a>&nbsp;<a href="/view/popular/TV.html?page=1&amp;srt=seeds&amp;pp=50&amp;order=asc" title="Sort torrents by seeds"><img alt="Sort" src="//images4et.com/images/sort2.gif"></a>
                    </th>
                    <th>
                        <a href="/view/popular/TV.html?page=1&amp;srt=leechers&amp;pp=50&amp;order=desc" title="Sort torrents by leechers">L</a>&nbsp;<a href="/view/popular/TV.html?page=1&amp;srt=leechers&amp;pp=50&amp;order=desc" title="Sort torrents by leechers"><img alt="Sort" src="//images4et.com/images/sort.gif"></a>
                    </th>
                    <th>Health</th>
                </tr>
            </thead>
            <tbody>
                <tr class="tlr">
                    <td>
                        <a href="/download/5499853/Designated.Survivor.S01E11.HDTV.x264-LOL%5Bettv%5D.torrent" title="Download Designated.Survivor.S01E11.HDTV.x264-LOL[ettv] torrent"><img alt="Download" src="//images4et.com/images/icon_download3.gif"></a><a href="magnet:?xt=urn:btih:873dfd9547d7d9822c8a73c34d60d8d99cf8b083&amp;dn=Designated.Survivor.S01E11.HDTV.x264-LOL%5Bettv%5D&amp;tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&amp;tr=udp%3A%2F%2Fglotorrents.pw%3A6969%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&amp;tr=udp%3A%2F%2Fzer0day.to%3A1337%2Fannounce&amp;tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969%2Fannounce" title="Magnet link"><img alt="Magnet link" hspace="3" src="//images4et.com/images/magnet2.png"></a>
                    </td>
                    <td class="tli">
                        <div id="tcmm">
                            <a href="/torrent/5499853/Designated.Survivor.S01E11.HDTV.x264-LOL%5Bettv%5D.html#comments" title="View comments">2<img alt="2 comments" class="icon" hspace="2" src="//images4et.com/images/icon_comment.gif"></a>
                        </div><img alt="English" class="icon" src="//images4et.com/images/flags/mini/uk-us.gif">&nbsp;<a href="/torrent/5499853/Designated.Survivor.S01E11.HDTV.x264-LOL%5Bettv%5D.html" title="view Designated.Survivor.S01E11.HDTV.x264-LOL[ettv] torrent">Designated.Survivor.S01E11.HDTV.x264-LOL[ettv]</a> <sup class="nano done" title="Users rating">7.50</sup> <span class="c_tor">in <a href="/category/113/Other+Torrents.html" title="Browse Other">Other</a></span><span class="c_tor">, by</span> <span class="micro"></span>
                        <div class="usr" onmouseout="umStop('uettv608');" onmouseover="umStart('uettv608');">
                            <span class="micro"></span>
                            <div class="usrm" id="uettv608">
                                <span class="micro"></span>
                            </div><span class="micro"><a href="/profile/ettv/" style="color:#CC0000;">ettv</a></span>
                        </div>
                    </td>
                    <td>5h</td>
                    <td>241.45&nbsp;MB</td>
                    <td class="sy">23624</td>
                    <td class="ly">20275</td>
                    <td>
                        <div class="r10"></div>
                    </td>
                </tr>
            </tbody>
        </table>
        <script type="text/javascript">
        eval(function(p,a,c,k,e,d){e=function(c){return c.toString(36)};if(!''.replace(/^/,String)){while(c--){d[c.toString(a)]=k[c]||c.toString(a)}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('g(["2.f","4.6","h.0","i.0","2.j","e.0","2.d","9.0","2.b"].c(k.7.l)==-1||3.u!==3.5){3.5.7.x("s://4.6/m/o/p.q?r=1&n=v&t=a&8=w")}',34,34,'com||extratorrent|window|extra|top|to|location|order|extratorrentonline|50|works|indexOf|one|extratorrentlive|cc|if|etmirror|etproxy|life|document|hostname|view|srt|popular|TV|html|page|https|pp|self|seeds|desc|replace'.split('|'),0,{}))
        </script>
    </div>

I am supposed to think that the following script which can be found at the end of the page does the decryption of the HTML data as the contents are being loaded encrypted so then the javascript decodes it back again.so how to reverse engineer to make it decrypt the code at anywhere??? cuz it' now just decrypting the code at the site only when visited manually through browser??

<script type="text/javascript">
            eval(function(p,a,c,k,e,d){e=function(c){return c.toString(36)};if(!''.replace(/^/,String)){while(c--){d[c.toString(a)]=k[c]||c.toString(a)}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('g(["2.f","4.6","h.0","i.0","2.j","e.0","2.d","9.0","2.b"].c(k.7.l)==-1||3.u!==3.5){3.5.7.x("s://4.6/m/o/p.q?r=1&n=v&t=a&8=w")}',34,34,'com||extratorrent|window|extra|top|to|location|order|extratorrentonline|50|works|indexOf|one|extratorrentlive|cc|if|etmirror|etproxy|life|document|hostname|view|srt|popular|TV|html|page|https|pp|self|seeds|desc|replace'.split('|'),0,{}))
            </script>

My CURL Code :

$url = "http://extratorrent.cc/view/popular/TV.html?page=1&srt=seeds&pp=50&order=desc";

 $ch = curl_init();

    curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3");
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);       

    $data = curl_exec($ch);
    curl_close($ch);

print_r($data);
Umair Shah
  • 2,305
  • 2
  • 25
  • 50
  • The Javascript code in the last code box is only for redirecting you to the main page, if the page is loaded in a frame. Go to http://jsbeautifier.org/ and paste the whole `eval(...)` expression there to see it. JSBeautifier will deobfuscate the code for you. – Neonit Mar 09 '17 at 08:31
  • @Neon : I am not familiar with much JS so if you can please check that for me..! – Umair Shah Mar 09 '17 at 08:37
  • Show us your CURL method. – Niek van der Maaden Mar 09 '17 at 08:39
  • I'm sure you can translate that JavaScript code to PHP. Looks doable. – apokryfos Mar 09 '17 at 09:02
  • @Umair Shah Yousafzai : Check what? I just didn't want to paste the whole code in a comment, that's why I directed you to the page so you could check it yourself. But I explained already everything the code does. It has nothing to do with decoding the JSON data you pasted in the first code box. – Neonit Mar 09 '17 at 09:31

2 Answers2

2

You could probably use a headless browser like PhantomJS to programmatically retrieve the page contents while allowing the JavaScript to do its thing.

klumme
  • 608
  • 3
  • 8
  • 1
    Some code would be useful as I am not aware with `PhamtomJS` that how it works..! – Umair Shah Mar 09 '17 at 08:45
  • @UmairShahYousafzai there's lots of examples at http://phantomjs.org/examples/ you can download PhantomJS and run it with `phantomjs example-script.js ` – apokryfos Mar 09 '17 at 09:01
1

It could be that this website looks at the User-Agent HTTP-header. Try setting a different User-Agent, with the -A-parameter, like this:

curl -A "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3" http://yahoo.com

Or using PHP:

curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3");

cURL does not execute JavaScript.

Update: And it seems JavaScript does need to be executed to 'unpack' the content. So a headless browser as suggested by one of the other answers is more appropriate!

Addendum to klumme's answer

A code example to fetch the 'unpacked' version of a document using JavaScript in a PhantomJS environment: (taken from: https://stackoverflow.com/a/12469284/694400)

var page = require('webpage').create();
page.open('http://google.com', function () {
  console.log(page.content);
  phantom.exit();
});

Or using http://jonnnnyw.github.io/php-phantomjs/ (which just uses the PhantomJS executable actually) (example taken from jonnnnyw's GitHub-page):

<?php

use JonnyW\PhantomJs\Client;

$client = Client::getInstance();

/** 
 * @see JonnyW\PhantomJs\Http\Request
 **/
$request = $client->getMessageFactory()->createRequest('http://jonnyw.me', 'GET');

/** 
 * @see JonnyW\PhantomJs\Http\Response 
 **/
$response = $client->getMessageFactory()->createResponse();

// Send the request
$client->send($request, $response);

if($response->getStatus() === 200) {

    // Dump the requested page content
    echo $response->getContent();
}
Community
  • 1
  • 1
Wieger
  • 663
  • 1
  • 9
  • 24