1

The source code of html page is show as below

<html>
<head>
<meta http-equiv=Content-Type content="text/html; charset=gb2312">
<script>
    document.domain = "xxxx.com";
    var jsonObj = {
        list: [
            {ip: "166.255.255.25", port: 1080, path: "/data/pps.jpeg"}
        ]
    }
    var jsParObj = {param1: 25532, param2: 54463}
</script>
</head>
<body>
</body>
</html>

I try to extract the data from that html page and store them in json format.

soup = BeautifulSoup(html_doc, 'html.parser')
script_text = soup.find('script')

Using python library BeautifulSoup4, I get this

<script>
    document.domain = "xxxx.com";
    var jsonObj = {
        list: [
            {ip: "166.255.255.25", port: 1080, path: "/data/pps.jpeg"}
        ]
    }
    var jsParObj = {param1: 25532, param2: 54463}
</script>

How can I remove the <script> tag and translate that data into json format? Also, I use python.

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
randy Pen
  • 171
  • 1
  • 2
  • 9

0 Answers0