How to get all the results from a web page just as the browser shows when finished scrolling down

Question

I'm trying to get all the video results from a web page :

$ curl -qs https://ok.ru/video/c335170 | pup '.video-card_lk attr{href}' | wc -l
24

Another method returns the same result :

$ wget --config="/dev/null" -qO- https://ok.ru/video/c335170 | grep -oP '/video/\d+' | sort -u | wc -l
24

EDIT 1: Scrolled the webpage to the end with firefox and saved it as c335170.html and I get the same result :

$ cat c335170.html | grep -oP '/video/\d+' | sort -u | wc -l
24

However, on the web browser, it shows, after the scrolling to the end, 81 results.

Same pb. with YouTube and the "Load more" button which hides results from http console clients :

$ curl -qs https://www.youtube.com/user/impacttvouaga/videos | grep -oP "/watch\?v=[\w-]+" | uniq | wc -l
21

EDIT 2: I've just saved this webpage with firefox as a "Web Page, HTML only" into RMC_IMPACTV__YouTube.html and then :

$ cat RMC_IMPACTV__YouTube.html | grep -oP "/watch\?v=[\w-]+" | uniq | wc -l
21

How can I have the remote HTTP server to give me all the results ?

See https://stackoverflow.com/questions/14417994/how-can-be-scraped-using-php-curl-a-webpage-with-infinite-scroll — peak, Apr 30 '19 at 21:55
@peak Wouah, this is getting much more complicated than I thought. Do this mean I have to write a `https://ok.ru` specific PHP script to retrieve what I want ? — SebMa, Apr 30 '19 at 22:20
I'd try to find out whether ok.ru doesn't have an API so you can avoid all the complexities of simulating "on scroll" triggers. (The simulation would not have to be done in PHP ...) — peak, Apr 30 '19 at 22:25
@peak First, I'd like to try something much more simple and save `https://ok.ru/video/c335170` with firefox as a `Web Page, HTML only` into `c335170.html` but somehow, `firefox` does not save all the results it shows into this file. Any idea why ? — SebMa, Apr 30 '19 at 22:48
@peak I did so, but it does not work and I don't understand why. Take a look at my EDIT1. — SebMa, Apr 30 '19 at 23:29
See also https://unix.stackexchange.com/questions/440965/how-to-curl-full-web-page-content — peak, May 01 '19 at 00:40
@peak I think I'll give console browsers (lynx, w3m, ...) a try before digging any deeper. — SebMa, May 01 '19 at 11:17
I was able to download the expanded HTML using the Chrome 'Developer Tools" interface, but it involves several steps ... — peak, May 01 '19 at 18:53
@peak I found this firefox addon : [Save Page WE](https://addons.mozilla.org/en-US/firefox/addon/save-page-we/). It works :-) — SebMa, May 01 '19 at 19:58
@peak This add-on is also available from Chrome : [Save Page WE](https://chrome.google.com/webstore/detail/save-page-we/dhhpefjklgkmgeafimnjhojgjamoafof) — SebMa, May 03 '19 at 01:36
@peak I've tried a few add-ons and it seems [Scroll it!](https://chrome.google.com/webstore/detail/scroll-it/nlndoolndemidhlomaokpfbicfnjeeed/related) is the most appropriate I've found so far, when it comes to auto scrolling down. — SebMa, May 04 '19 at 19:29

score 0 · Answer 1 · answered Apr 16 '20 at 23:47

0

To download the expanded html one I installed Save Page WE and to scroll down I installed Scroll it!

answered Apr 16 '20 at 23:47

SebMa

4,037
29
39

How to get all the results from a web page just as the browser shows when finished scrolling down

1 Answers1