Get all the lines with a certain string

Question

I have a small problem and I hope someone can help me with that. Basically, I have a script that downloads thumbnails from Youtube, it works normally, but now I want it to be more advanced and have the option to give the url of a playlist (system to choose already made) and get the html page of the playlist, then find all the lines that contain /watch?v= (the url of the video) and then take out everything except for the video id (the series of characters after v=).

Now I have the downloading system working, I just cannot find a way to make get the lines with /watch?v=.

Here's my code with the downloading of webpage and finding line parts

read -p "Enter the url of the playlist : " link #Ask for url

content=$(curl $link --silent) #Downloads the webpage

contentxt="$basedir/playlist_page.txt" #Creates a file to store the webpage

echo $content > "$contentxt" #Saves the webpage into the file

url=$(grep -F "/watch?v=" $contentxt) #Find a line with the /watch?v=

echo $url #Displays that line containing the url to be used later

Thank you!

score 0 · Accepted Answer · edited May 23 '17 at 12:06

0

Here's an example of how this can be done using sed, tested on a page I just created on jsfiddle:

curl --silent http://jsfiddle.net/udfmq9jv/| grep -F '/watch?v='| sed -E 's!.*/watch\?v=([a-zA-Z0-9_-]*).*!\1!';
## a1Y73sPHKxw
## -rIEVBIP5yc
## dMH0bHeiRNg

Note that the exact regex is important here: from How to validate youtube video ids?, valid characters in the video id are letters, digits, underscore, and dash.

There are a few ways of collecting the output of a command into a variable. Here's how it can be done using process substitution, a while loop, and read:

ids=(); while read -r; do ids+=("$REPLY"); done < <(curl --silent http://jsfiddle.net/udfmq9jv/| grep -F '/watch?v='| sed -E 's!.*/watch\?v=([a-zA-Z0-9_-]*).*!\1!');
echo ${#ids[@]};
## 3
echo "${ids[0]}";
## a1Y73sPHKxw
echo "${ids[1]}";
## -rIEVBIP5yc
echo "${ids[2]}";
## dMH0bHeiRNg

edited May 23 '17 at 12:06

Community

1
1

answered Aug 03 '15 at 04:22

bgoldst

34,190
6
38
64

Ok it works but outputs the same id twice, is that normal? Also, If I were to use that data, how would I get it and put it into a variable, maybe an array or something? – Pandawan Aug 03 '15 at 04:32
Regarding the outputting of the same id twice, that will happen if the same id is present in the source. You can `...| sort| uniq` to remove duplicates. – bgoldst Aug 03 '15 at 04:42
Ok, I think I know why it puts the ids twice, because there is a href for when you click on the link but also on the thumbnail. So I'll try to search for a way to remove duplicates. So to put in a variable, should I do this `variable=$(curl --silent http://jsfiddle.net/udfmq9jv/| grep -F '/watch?v='| sed -E 's!.*/watch\?v=([a-zA-Z0-9_-]*).*!\1!';)` – Pandawan Aug 03 '15 at 04:44
Use the following syntax: `for ((i = 0; i < ${#ids[@]}; ++i)); do stmt1; stmt2; ...; done;`. – bgoldst Aug 03 '15 at 05:08

Get all the lines with a certain string

1 Answers1