0

I have a small problem and I hope someone can help me with that. Basically, I have a script that downloads thumbnails from Youtube, it works normally, but now I want it to be more advanced and have the option to give the url of a playlist (system to choose already made) and get the html page of the playlist, then find all the lines that contain /watch?v= (the url of the video) and then take out everything except for the video id (the series of characters after v=).

Now I have the downloading system working, I just cannot find a way to make get the lines with /watch?v=.

Here's my code with the downloading of webpage and finding line parts

read -p "Enter the url of the playlist : " link #Ask for url

content=$(curl $link --silent) #Downloads the webpage

contentxt="$basedir/playlist_page.txt" #Creates a file to store the webpage

echo $content > "$contentxt" #Saves the webpage into the file

url=$(grep -F "/watch?v=" $contentxt) #Find a line with the /watch?v=

echo $url #Displays that line containing the url to be used later

Thank you!

Pandawan
  • 117
  • 3
  • 11

1 Answers1

0

Here's an example of how this can be done using sed, tested on a page I just created on jsfiddle:

curl --silent http://jsfiddle.net/udfmq9jv/| grep -F '/watch?v='| sed -E 's!.*/watch\?v=([a-zA-Z0-9_-]*).*!\1!';
## a1Y73sPHKxw
## -rIEVBIP5yc
## dMH0bHeiRNg

Note that the exact regex is important here: from How to validate youtube video ids?, valid characters in the video id are letters, digits, underscore, and dash.


There are a few ways of collecting the output of a command into a variable. Here's how it can be done using process substitution, a while loop, and read:

ids=(); while read -r; do ids+=("$REPLY"); done < <(curl --silent http://jsfiddle.net/udfmq9jv/| grep -F '/watch?v='| sed -E 's!.*/watch\?v=([a-zA-Z0-9_-]*).*!\1!');
echo ${#ids[@]};
## 3
echo "${ids[0]}";
## a1Y73sPHKxw
echo "${ids[1]}";
## -rIEVBIP5yc
echo "${ids[2]}";
## dMH0bHeiRNg
Community
  • 1
  • 1
bgoldst
  • 34,190
  • 6
  • 38
  • 64
  • Ok it works but outputs the same id twice, is that normal? Also, If I were to use that data, how would I get it and put it into a variable, maybe an array or something? – Pandawan Aug 03 '15 at 04:32
  • Regarding the outputting of the same id twice, that will happen if the same id is present in the source. You can `...| sort| uniq` to remove duplicates. – bgoldst Aug 03 '15 at 04:42
  • Ok, I think I know why it puts the ids twice, because there is a href for when you click on the link but also on the thumbnail. So I'll try to search for a way to remove duplicates. So to put in a variable, should I do this `variable=$(curl --silent http://jsfiddle.net/udfmq9jv/| grep -F '/watch?v='| sed -E 's!.*/watch\?v=([a-zA-Z0-9_-]*).*!\1!';)` – Pandawan Aug 03 '15 at 04:44
  • Use the following syntax: `for ((i = 0; i < ${#ids[@]}; ++i)); do stmt1; stmt2; ...; done;`. – bgoldst Aug 03 '15 at 05:08