I need to extract the video names from youtube's index.html. I have been able to break apart the file into small chunks, each containing one video listing, however I cannot seem to extract the video title. My professor has provided the following command, however I cannot seem to get it to work in this situation.
number=`expr "$s" : ".*\/\([0-9,]*\)\/"`; echo $number # will print 250,4211
Although I'm not completely sure, I think I'm having trouble getting this method to work because there aren't spaces between the video title and surrounding text. Here is a sample of what I would need to extract the title from:
<li class="video-list-item "><a href="/watch?v=9BbgvlgDQMg&feature=g-sptl&cid=inp-hs-edt" class="video-list-item-link yt-uix-sessionlink" data-sessionlink="ei=CMzmroaB5bICFRiXIQoda3kX5g%3D%3D&feature=g-sptl%26cid%3Dinp-hs-edt" ><span class="ux-thumb-wrap contains-addto "><span class="video-thumb ux-thumb yt-thumb-default-120 "><span class="yt-thumb-clip"><span class="yt-thumb-clip-inner"><img src="http://s.ytimg.com/yt/img/pixel-vfl3z5WfW.gif" alt="Lil' Buck "Golden Gateway" Venice Beach California YAK FILMS Super Bowl 2012 Madonna Memphis Jookin" data-thumb="//i2.ytimg.com/vi/9BbgvlgDQMg/default.jpg" width="120" ><span class="vertical-align"></span></span></span></span><span class="video-time">3:51</span>
Out of this chunk of text, I would need to extract "Lil' Buck "Golden Gateway" Venice Beach California YAK FILMS Super Bowl 2012 Madonna Memphis Jookin", without the quotes.