I am working on a shell script where a user can input the IMDb numeric code of a movie (EX: 0076759
corresponds to "Star Wars: A New Hope") from the movie's page URL on the site. My intention with the program is that if the user executes the script:
bash search_movie 0076759
, the output is as follows:
Star Wars: Episode IV - A New Hope (1977)
Luke Skywalker joins forces with a...[Rest of Plot Summary Text here]
This is my current script below:
#!/usr/bin/bash
# moviedata--Given a movie or TV title, returns a list of matches. If the user
# specifies an IMDb numeric index number, however, returns the synopsis of
# the film instead.
# Remember to install lynx with command: sudo yum install lynx
titleurl="http://www.imdb.com/title/tt"
imdburl="http://www.imdb.com/find?s=tt&exact=true&ref_=fn_tt_ex&q="
tempout="/tmp/moviedata.$$"
# Produce a synopsis of the film.
summarize_film() {
grep "<title>" $tempout | sed 's/<[^>]*>//g;s/(more)//'
grep --color=never -A2 '<h5>Plot:' $tempout | tail -1 | \
cut -d\< -f1 | fmt | sed 's/^/ /'
exit 0
}
trap "rm -f $tempout" 0 1 15
if [ $# -eq 0 ] ; then
echo "Usage: $0 {movie title | movie ID}" >&2
exit 1
fi
# Checks whether we're asking for a title by IMDb title number
nodigits="$(echo $1 | sed 's/[[:digit:]]*//g')"
if [ $# -eq 1 -a -z "$nodigits" ] ; then
lynx -source "$titleurl$1/combined" > $tempout
summarize_film
exit 0
fi
# It's not an IMDb title number, search for titles.
fixedname="$(echo $@ | tr ' ' '+')" # for the URL
url="$imdburl$fixedname"
lynx -source $imdburl$fixedname > $tempout
# No results:
fail="$(grep --color=never '<h1 class="findHeader">No ' $tempout)"
# If more than one matching title found:
if [ ! -z "$fail" ] ; then
echo "Failed: no results found for $1"
exit 1
elif [ ! -z "$(grep '<h1 class="findHeader">Displaying' $tempout)" ] ; then
grep --color=never '/title/tt' $tempout | \
sed 's/</\
</g' | \
grep -vE '(.png|.jpg|>[ ]*$)' | \
grep -A 1 "a href=" | \
grep -v '^--$' | \
sed 's/<a href="\/title\/tt//g;s/<\/a> //' | \
awk '(NR % 2 == 1) { title=$0 } (NR % 2 == 0) { print title " " $0 }' | \
sed 's/\/.*>/: /' | \
sort
fi
exit 0
When executing the script, the output gets to the relevant movie page successfully but it does not return the plot summary and also outputs a mess of website tracker info as well
I would greatly appreciate it if I could get some insight into what I'm doing wrong in my script.