4

I need some sort of algorythm that would extract the link from mp4engine.

Here is the example of a page I want to scrap.

Desired output in this case would be: http://mp4engine.com:182/d/a2chmyndcqqgkpskitclvbgu5pgwxve2vmlrdsctpwbte2flb4i4hrz6/.hack_Roots (Dub) Episode 001-360p.mp4

I tried to use HtmlAgilityPack to get the player code, but it's p,a,c,k,e,d, and I'm unable to execute it inside my C# Windows Phone 8.1 project. I thought about using Jurassic package to execute the JS, but it doesn't seem to work with WinPhone8.1

Here is the script I get using HAP:

<script type='text/javascript'>eval(function(p,a,c,k,e,d){while(c--    )if(k[c])p=p.replace(new RegExp('\\b'+c.toString(a)+'\\b','g'),k[c]);return p} ('15("14").13({f:"0://2.1:e/d/c/.b (a) 9 8- 7.6",12:"0://2.1/4/h.g",11:"0://2.1/i/10/z.y",x:"w",v:u,t:s,5:"0",r:"0://2.1/4/q /p",o:[{3:"n",m:"0://2.1/4/h.g"},{3:"l",k:{f:\'0://2.1:e/d/c/.b (a) 9 8- 7.6\',\'5\':\'0\'}},{3:"j"}],});',36,42,'http|com|mp4engine|type|player|provider|mp4|360p|001|Episode|Dub|hack_Roots|a2chmyndcqqgkpskitclvbgu5pgwxve2vmlrdsctpwbte2flb4i4hrz6||182|file|swf|jw6||download|config|html5|src|flash|modes|six|skins|skin|420|height|722|width|1484|duration|jpg|hahgl235zwv2|00000|image|flashplayer|setup|flvplayer|jwplayer'.split('|')))

I have also tried to use built-in WebView Control:

WebView wv = new WebView();
//... navigation to string and all that
var res = await wv.InvokeScriptAsync("eval", null);

Unfortunately, the function returns empty string (res = "")

I have also searched for base64 string that I could decode, but the page doesn't seem to have one.

What can I do to get the video URL?

Reynevan
  • 1,475
  • 1
  • 18
  • 35

2 Answers2

0

Inside the <div id="player_code" ..., the last <script> tag contain the obfuscated javascript code where the video URL is. This site can unobfuscate this code and the result will look like this:

jwplayer("flvplayer").setup({
    file: "http://mp4engine.com:182/d/a2chmyndcqqgkpskitclvbgu5pgwxve2vmlrdsctpwbsg7asjwghgk4p/.hack_Roots (Dub) Episode 001-360p.mp4",
    flashplayer: "http://mp4engine.com/player/jw6.swf",
    image: "http://mp4engine.com/i/00000/hahgl235zwv2.jpg",
    duration: "1484",
    width: 722,
    height: 420,
    provider: "http",
    skin: "http://mp4engine.com/player/skins/six",
    modes: [{
        type: "flash",
        src: "http://mp4engine.com/player/jw6.swf"
    }, {
        type: "html5",
        config: {
            file: 'http://mp4engine.com:182/d/a2chmyndcqqgkpskitclvbgu5pgwxve2vmlrdsctpwbsg7asjwghgk4p/.hack_Roots (Dub) Episode 001-360p.mp4',
            'provider': 'http'
        }
    }, {
        type: "download"
    }],
});

As you can see, file: contains the video URL. So, in your C# code, you can download the HTML code of the page, look for the last <script> tag inside the <div id="player_code" ... and unobfuscate it using the C# port listed on same site.

varnaud
  • 237
  • 3
  • 12
0

HtmlAgilityPack only take the static HTML code, you need to execute the dynamic content (javascript) to take the data.

You have three ways:

1 - Implement a beautifier code for javascript in your c# (here you can see an example: http://jsbeautifier.org/). In this case and only for you case, you can extract the video url because is on it, but this is not common.

2 - Using the .net web browser to connect to the page and execute the javascript code to scrape the data, in this case you application must be a Windows Form application.

3 - Using a headless-browser to connect to the page and execute the javascript code to scrape the data. You could use the famous phatomjs. Example here: C# example of using PhantomJS webdriver ExecutePhantomJS to filter out images

Community
  • 1
  • 1
dlopezgonzalez
  • 4,217
  • 5
  • 31
  • 42