0

I'm scraping website with a rust tool: scraper for learning purpose, and plan to export it to node module with neon-bindings.

When I scraping for list of url of images which load on the website, I notice there's a "packed" function in the script tag

eval(function(p, a, c, k, e, d) {
    ...
}(...))

From this answer I learn that this is a function which decompresses obfuscated JS code.

So I use this tool to unpack the code, and I get this:

var newImgs = [
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/1_8773.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/2_2594.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/3_9540.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/4_1324.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/5_1520.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/6_3015.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/7_6748.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/8_4063.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/9_1616.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/10_2885.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/11_6712.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/12_4984.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/13_5132.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/14_4691.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1',
    'http://manhua1034-104-250-139-219.cdnmanhua.net/1/432/1039483/15_9655.jpg?cid=1039483&key=d4366ee77be6255eeba85878cf442bbe&type=1'
]

Which contain exactly the data I wanted, but I have no idea how to get it in rust.

Is there any method to getting the data from obfuscated JS code in rust?

The page I plan to do web scraping: http://m.dm5.com/m1039483/

Herohtar
  • 5,347
  • 4
  • 31
  • 41
jiale ko
  • 139
  • 1
  • 13
  • Doesn't seem like it. As a last resort you can call [JS code from Rust](https://rust-by-example-ext.com/webassembly/nodejshelper.html), thought you know it's not fast, but should be suitable for scraping – Alexey S. Larionov Aug 14 '20 at 16:08
  • If the decompression function isn't too complex, you could try reimplementing it in Rust? – eggyal Aug 14 '20 at 16:19
  • @AlexLarionov actually it's maybe a good idea for my case if there's no other choice, but the example you provided if I not mistaken it looks like a library similar to neon binding, since I already using neon binding, I'm not sure it is suitable for my case or not – jiale ko Aug 14 '20 at 17:07
  • @eggyal the decompression function is actually quite complex for me, this is the reason why currently I'm looking for alternative solution if possible – jiale ko Aug 14 '20 at 17:13
  • @jialeko the main problem with implementing a crate for it is that you can't really unpack the file unless you know the library/algorithm that packed it. And unfortunately there are many libraries... You're lucky to even discovering which exact algorithm was used for this particular web-site – Alexey S. Larionov Aug 14 '20 at 17:38

0 Answers0