0

I'm trying to get the content of a script tag from however it does not work.

This is the script tag in the that I'm trying to web scrape

<script type="text/javascript">_rg.update({"bootstrap":{"apps":{"comingLeaving":{},"canonicalsDir":{"data":[],"isLoading":false,"hasLoaded":false,"loadError":false},"sitemap":{}},"entities":{"entries":{"movie:3fe720fa-13dd-4421-9e0b-0ce6a2efdd4f:@global":{"title":"My Neighbor Totoro","released_on":"1988-04-16T00:00:00","imdb_rating":8.2,"rt_critics_rating":94,"rg_content_score":100,"has_poster":true,"has_backdrop":true,"slug":"my-neighbor-totoro-1988","rg_id":"3fe720fa-13dd-4421-9e0b-0ce6a2efdd4f...</script>

This is my code:

link='https://reelgood.com/movie/dollars-1971'
print(link)
source = requests.get (link).text
soup = BeautifulSoup(source, "html.parser")
content = soup.select_one('head > script:nth-of-type(14)')
print(content)

When I print content it prints None. Any help??

Ori Peleg
  • 15
  • 1
  • 3

2 Answers2

0
import requests
from bs4 import BeautifulSoup

link='https://reelgood.com/movie/dollars-1971'
print(link)
source = requests.get (link).text
soup = BeautifulSoup(source, "html.parser")
content = soup.find_all("head")
print(content)

I tried the same request and looked what the head contains. And there are no scripts.

This is the head:

   <head>
   <meta charset="utf-8"/>
   <meta content="app-id=1031391869" name="apple-itunes-app"/>
   <meta content="width=device-width, initial-scale=1.0, maximum-scale=2.0, minimum-scale=1.0" name="viewport"/>
   <meta content="app-id=1031391869" name="apple-itunes-app"/>
   <meta content="#091017" name="theme-color"/>
   <link href="/manifest.b99aa2adf1e15b406bab.json" rel="manifest"/>
   <meta content="yes" name="mobile-web-app-capable"/>
   <meta content="#081017" name="theme-color"/>
   <meta content="Reelgood" name="application-name"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-icon-57x57.png" rel="apple-touch-icon" sizes="57x57"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-icon-60x60.png" rel="apple-touch-icon" sizes="60x60"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-icon-72x72.png" rel="apple-touch-icon" sizes="72x72"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-icon-76x76.png" rel="apple-touch-icon" sizes="76x76"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-icon-114x114.png" rel="apple-touch-icon" sizes="114x114"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-icon-120x120.png" rel="apple-touch-icon" sizes="120x120"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-icon-144x144.png" rel="apple-touch-icon" sizes="144x144"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-icon-152x152.png" rel="apple-touch-icon" sizes="152x152"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-icon-180x180.png" rel="apple-touch-icon" sizes="180x180"/>
   <meta content="yes" name="apple-mobile-web-app-capable"/>
   <meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"/>
   <meta content="Reelgood" name="apple-mobile-web-app-title"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/favicon-32x32.png" rel="icon" sizes="32x32" type="image/png"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/favicon-16x16.png" rel="icon" sizes="16x16" type="image/png"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/favicon.ico" rel="shortcut icon"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-startup-image-320x460.png" media="(device-width: 320px) and (device-height: 480px) and (-webkit-device-pixel-ratio: 1)" rel="apple-touch-startup-image"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-startup-image-640x920.png" media="(device-width: 320px) and (device-height: 480px) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-startup-image-640x1096.png" media="(device-width: 320px) and (device-height: 568px) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-startup-image-750x1294.png" media="(device-width: 375px) and (device-height: 667px) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-startup-image-1182x2208.png" media="(device-width: 414px) and (device-height: 736px) and (orientation: landscape) and (-webkit-device-pixel-ratio: 3)" rel="apple-touch-startup-image"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-startup-image-1242x2148.png" media="(device-width: 414px) and (device-height: 736px) and (orientation: portrait) and (-webkit-device-pixel-ratio: 3)" rel="apple-touch-startup-image"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-startup-image-748x1024.png" media="(device-width: 768px) and (device-height: 1024px) and (orientation: landscape) and (-webkit-device-pixel-ratio: 1)" rel="apple-touch-startup-image"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-startup-image-768x1004.png" media="(device-width: 768px) and (device-height: 1024px) and (orientation: portrait) and (-webkit-device-pixel-ratio: 1)" rel="apple-touch-startup-image"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-startup-image-1496x2048.png" media="(device-width: 768px) and (device-height: 1024px) and (orientation: landscape) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image"/>
   <link href="https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/icons/apple-touch-startup-image-1536x2008.png" media="(device-width: 768px) and (device-height: 1024px) and (orientation: portrait) and (-webkit-device-pixel-ratio: 2)" rel="apple-touch-startup-image"/>
   <style type="text/css">
      a,abbr,acronym,address,applet,article,aside,audio,b,big,blockquote,body,canvas,caption,center,cite,code,dd,del,details,dfn,div,dl,dt,em,embed,fieldset,figcaption,figure,footer,form,h1,h2,h3,h4,h5,h6,header,hgroup,html,i,iframe,img,ins,kbd,label,legend,li,mark,menu,nav,object,ol,output,p,pre,q,ruby,s,samp,section,small,span,strike,strong,sub,summary,sup,table,tbody,td,tfoot,th,thead,time,tr,tt,u,ul,var,video{margin:0;padding:0;border:0;font-size:100%;vertical-align:baseline}article,aside,details,figcaption,figure,footer,header,hgroup,menu,nav,section{display:block}body{line-height:1}ol,ul{list-style:none}blockquote,q{quotes:none}blockquote:after,blockquote:before,q:after,q:before{content:"";content:none}table{border-collapse:collapse;border-spacing:0}@font-face{font-family:ProximaNova-Bold;src:url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/04359cf2.eot);src:url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/04359cf2.eot?#iefix) format("embedded-opentype"),url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/a174539b.woff2) format("woff2"),url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/e225e423.woff) format("woff"),url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/57d07452.ttf) format("truetype");font-display:swap}@font-face{font-family:ProximaNova-Medium;src:url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/72e21325.eot);src:url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/72e21325.eot?#iefix) format("embedded-opentype"),url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/accaaa5f.woff2) format("woff2"),url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/6837fa9a.woff) format("woff"),url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/2d652581.ttf) format("truetype");font-display:swap}body *{font-family:ProximaNova-Medium,Arial,sans-serif;font-weight:400}body b,body h1,body h2,body h3,body h4,body h5,body h6,body strong{font-family:ProximaNova-Bold,Arial,sans-serif}@font-face{font-display:swap;font-family:reelgood-icons;src:url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/97eaea8a.eot);src:url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/97eaea8a.eot#iefix) format("embedded-opentype"),url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/f419c062.ttf) format("truetype"),url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/f2af4e47.woff) format("woff"),url(https://assets.reelgood.com/p/038f409f1cfce8336709c5eea5f285b67e119f88/3230bae8.svg#reelgood-icons) format("svg");font-weight:400;font-style:normal}[class*=" icon-"],[class^=icon-]{font-family:reelgood-icons;speak:none;font-style:normal;font-weight:400;font-variant:normal;text-transform:none;line-height:1;-webkit-font-smoothing:antialiased;-moz-osx-font-smoothing:grayscale}.icon-account:before{content:"\e900"}.icon-browseBest:before{content:"\e901"}.icon-browseGrid:before{content:"\e902"}.icon-browseTable:before{content:"\e903"}.icon-checkMark:before{content:"\e904"}.icon-dollarSign:before{content:"\e905"}.icon-embed:before{content:"\e906"}.icon-episodes:before{content:"\e907"}.icon-friendlyPlay:before{content:"\e908"}.icon-information:before{content:"\e909"}.icon-leftRightArrow:before{content:"\e90a"}.icon-offline:before{content:"\e90b"}.icon-play:before{content:"\e90c"}.icon-plusToSee:before{content:"\e90d"}.icon-reelgoodR:before{content:"\e90e"}.icon-search:before{content:"\e90f"}.icon-seasonsBrowseIcon:before{content:"\e910"}.icon-seenAllToHere:before{content:"\e911"}.icon-share:before{content:"\e912"}.icon-thickCloseX:before{content:"\e913"}.icon-thinCloseX:before{content:"\e914"}.icon-tracking:before{content:"\e915"}.icon-upDownArrow:before{content:"\e916"}.icon-viewMoreDots:before{content:"\e917"}html{overflow-x:hidden;overflow-y:scroll;min-width:100%;min-height:100%;}html:not(.embed){background-color:#0a1016;}html body{min-width:100%;min-height:100%;-webkit-font-smoothing:antialiased;}html body.askew{transform:rotate(1deg);}html body a{outline:0;}html body #nprogress .bar{background:rgba(0,220,136,0.9);will-change:transform;}html body #nprogress .peg{box-shadow:0 0 10px #00dc89,0 0 5px #00dc89;}html body button,html body [role="button"]{outline:none;}html body .ReactVirtualized__Grid,html body .ReactVirtualized__List{outline:none;}
      html body .ReactVirtualized__Grid__innerScrollContainer {
      padding-right: 64px;
      }
      @media screen and (min-width: 1025px) {
      html body .ReactVirtualized__Grid__innerScrollContainer {
      padding-right: 64px;
      }
      }
      @media screen and (max-width: 1025px) {
      html body .ReactVirtualized__Grid__innerScrollContainer {
      padding-right: 4vw;
      }
      }
      @media screen and (max-width: 768px) {
      html body .ReactVirtualized__Grid__innerScrollContainer {
      padding-right: 40px;
      }
      }
      @media screen and (max-width: 420px) {
      html body .ReactVirtualized__Grid__innerScrollContainer {
      padding-right: 18px;
      }
      }
      html body.noScroll{overflow:hidden;}html.noScroll{overflow-y:hidden;}.amp-mp{position:absolute;height:50px;width:50px;opacity:0.1;}
   </style>
   <title data-react-helmet="true" itemprop="name" lang="en">$ (1971) - Where to Watch It Streaming Online | Reelgood</title>
   <meta content="$ (1971) - Where to Watch It Streaming Online" data-react-helmet="true" name="title" property="og:title">
   <meta content="$ is only available for rent or buy starting at $2.99. Get notified if it comes to one of your streaming services, like Netflix, on reelgood.com." data-react-helmet="true" name="description" property="og:description">
   <meta content="https://img.reelgood.com/content/movie/82c55bc0-25f9-44d6-9b95-1ee6551f4b3c/poster-342.jpg" data-react-helmet="true" name="image" property="og:image">
   <meta content="video.movie" data-react-helmet="true" property="og:type">
   <meta content="image/jpg" data-react-helmet="true" property="og:image:type">
   <meta content="342" data-react-helmet="true" property="og:image:width">
   <meta content="513" data-react-helmet="true" property="og:image:height">
   <meta content="https://reelgood.com/movie/dollars-1971" data-react-helmet="true" property="og:url">
   <meta content="summary" data-react-helmet="true" name="twitter:card">
   <meta content="$ (1971) - Where to Watch It Streaming Online" data-react-helmet="true" name="twitter:title">
   <meta content="$ is only available for rent or buy starting at $2.99. Get notified if it comes to one of your streaming services, like Netflix, on reelgood.com." data-react-helmet="true" name="twitter:description">
   <meta content="https://img.reelgood.com/content/movie/82c55bc0-25f9-44d6-9b95-1ee6551f4b3c/poster-342.jpg" data-react-helmet="true" name="twitter:image"/>
   <meta content="$, 1971, streaming, online, watch, buy, rent, movie" data-react-helmet="true" name="keywords"/>
   <link data-react-helmet="true" href="https://reelgood.com/movie/dollars-1971" rel="canonical">
   <link data-react-helmet="true" href="https://reelgood.com/movie/dollars-1971?amp=true" rel="amphtml">
   <link data-react-helmet="true" href="https://reelgood.com/movie/dollars-1971" hreflang="x-default" rel="alternate">
   <link data-react-helmet="true" href="https://reelgood.com/movie/dollars-1971" hreflang="en-us" rel="alternate">
   <link data-react-helmet="true" href="https://reelgood.com/movie/dollars-1971" hreflang="en" rel="alternate">
   <link data-react-helmet="true" href="https://reelgood.com/uk/movie/dollars-1971" hreflang="en-gb" rel="alternate"/>
   </link></link></link></link></link></meta></meta></meta></meta></meta></meta></meta></meta></meta></meta></meta>
</head>
Janu
  • 44
  • 6
0

You need to set 2 headers in the request in order to get the expected page source. Once you extract the JavaScript object housing the data you want, you need to fix unescaped " to make for valid JSON, then you can parse with JSON package. I use some code by @tobias_k (cited below) to handle the fix.

import requests, json

headers = {'user-agent': 'Mozilla/5.0', 'accept-language': 'en-GB,en-US;q=0.9,en;q=0.8'}

r = requests.get('https://reelgood.com/movie/dollars-1971', headers=headers).text

s = re.search(r'_rg\.update\((.*?)\)<', r).group(1)

while True:  #https://stackoverflow.com/a/18515887 @tobias_k
    try:
        result = json.loads(s)   # try to parse...
        break                    # parsing worked -> exit loop
    except Exception as e:
        # "Expecting , delimiter: line 34 column 54 (char 1158)"
        # position of unexpected character after '"'
        unexp = int(re.findall(r'\(char (\d+)\)', str(e))[0])
        # position of unescaped '"' before that
        unesc = s.rfind(r'"', 0, unexp)
        s = s[:unesc] + r'\"' + s[unesc+1:]
        # position of correspondig closing '"' (+2 for inserted '\')
        closg = s.find(r'"', unesc + 2)
        s = s[:closg] + r'\"' + s[closg+1:]
print(result.keys())
QHarr
  • 83,427
  • 12
  • 54
  • 101