EDIT 1: I'd like to extract video urls and titles from "https://ok.ru/video/c1404844" results using the CLI.
Here's want I've done so far :
The ERE pattern for each video relative URL is :
/video/\d+
and the video absolute URL looks like this : https://ok.ru$videoRelativeURL
I can use this command to extract the video urls (I use uniq
because many video IDs appear 3 times) :
$ curl -s https://ok.ru/video/c1404844 | grep -oP "/video/\d+" | uniq | sed "s|^|https://ok.ru|" | head -5
https://ok.ru/video/1896971373228
https://ok.ru/video/1896971438764
https://ok.ru/video/1896971569836
https://ok.ru/video/1896971635372
https://ok.ru/video/1898415590060
Then I tried extracting the video relativeURLs + title with pup.
EDIT 3 : I replaced the class name video-card_n ellip
by video-card_n.ellip
. However pup
only outputs the attribute of the second class (video-card_n.ellip
), strange :
$ curl -s https://ok.ru/video/c1404844 | pup '.video-card_lk attr{href}, .video-card_n.ellip attr{title}' | head -5
Death.in.Paradise.S02E05.WEBRip.x264-ION10
Death.in.Paradise.S02E02.WEBRip.x264-ION10
Death.in.Paradise.S02E04.WEBRip.x264-ION10
Death.in.Paradise.S02E03.WEBRip.x264-ION10
Death.in.Paradise.S02E06.WEBRip.x264-ION10
It didn't work so I converted the expanded html to json with this command :
$ curl -s https://ok.ru/video/c1404844 | pup 'json{}' > c1404844.json
Now I want to try and extract the title
from video-card_n ellip
and the href
from video-card_lk
from the resulting json file with the jq tool but I know how to use jq
enough.
I'd like jq
(or pup
) to output a flat file : the url as the first column and the title as the second column.
EDIT 2 : A big thank you to @peak for his help on jq
!
DONE :
$ curl -s https://ok.ru/video/c1404844 | pup 'json{}' | jq -r 'recurse | arrays[] | select(.class == "video-card_lk").href,select(.class == "video-card_n ellip").title' | awk '{videoRelativeURL = $0;url="https://ok.ru"gensub("?.*$","",videoRelativeURL); getline title; print url" # "title}' | head
https://ok.ru/video/1898417425068 # Death.in.Paradise.S02E05.WEBRip.x264-ION10
https://ok.ru/video/1898417359532 # Death.in.Paradise.S02E02.WEBRip.x264-ION10
https://ok.ru/video/1898417293996 # Death.in.Paradise.S02E04.WEBRip.x264-ION10
https://ok.ru/video/1898417228460 # Death.in.Paradise.S02E03.WEBRip.x264-ION10
https://ok.ru/video/1898417162924 # Death.in.Paradise.S02E06.WEBRip.x264-ION10
https://ok.ru/video/1898417097388 # Death.in.Paradise.S02E07.WEBRip.x264-ION10
https://ok.ru/video/1898417031852 # Death.in.Paradise.S02E08.WEBRip.x264-ION10
https://ok.ru/video/1898416966316 # Death.in.Paradise.S02E01.WEBRip.x264-ION10
https://ok.ru/video/1898416769708 # Death.in.Paradise.S07E02.The.Stakes.Are.High.WEBRip.x264-ION10
https://ok.ru/video/1898416704172 # Death.in.Paradise.S07E03.Written.in.Murder.WEBRip.x264-ION10
...