0

I am very, very new to web scraping. But I tried running the following code:

import requests
import json

headers={'Host': 'www.bloomberg.com',
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
 'Accept': '*/*',
 'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
 'Accept-Encoding': 'gzip, deflate, br',
 'Referer': 'https://www.bloomberg.com/quote/AAPL:INDAAPL:IND',
 'DNT': '1',
 'Connection': 'keep-alive',
 'TE': 'Trailers'}
url='https://www.bloomberg.com/markets2/api/datastrip/IBVC%3AIND?locale=en&customTickerList=true'
response = requests.get(url=url, headers=headers)

response.json()

An Error is shown as follows.

 ---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-5-543d39c3046b> in <module>
     14 response = requests.get(url=url, headers=headers)
     15 
---> 16 response.json()

c:\programdata\anaconda3\lib\site-packages\requests\models.py in json(self, **kwargs)
    896                     # used.
    897                     pass
--> 898         return complexjson.loads(self.text, **kwargs)
    899 
    900     @property

c:\programdata\anaconda3\lib\json\__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    355             parse_int is None and parse_float is None and
    356             parse_constant is None and object_pairs_hook is None and not kw):
--> 357         return _default_decoder.decode(s)
    358     if cls is None:
    359         cls = JSONDecoder

c:\programdata\anaconda3\lib\json\decoder.py in decode(self, s, _w)
    335 
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

c:\programdata\anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
    353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
--> 355             raise JSONDecodeError("Expecting value", s, err.value) from None
    356         return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

I tried searching the web and found a couple of answered questions here but was unable to discover the issue. In particular, I tried using following the comment provided in this [link][1], but it was not helpful. That is I changed the last line to

requests.get(url, headers=headers).json()

I also tried the following code which is expecting my URL is an HTML file.

import requests
import json

headers={'Host': 'www.bloomberg.com',
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
 'Accept': '*/*',
 'Accept-Language': 'de,en-US;q=0.7,en;q=0.3',
 'Accept-Encoding': 'gzip, deflate, br',
 'Referer': 'https://www.bloomberg.com/quote/AAPL:INDAAPL:IND',
 'DNT': '1',
 'Connection': 'keep-alive',
 'TE': 'Trailers'}
url='https://www.bloomberg.com/markets2/api/datastrip/IBVC%3AIND?locale=en&customTickerList=true'
response = requests.get(url=url, headers=headers)
response.content.decode('utf-8')

Which gives the following results

'<!doctype html>\n<html>\n<head>\n    <title>Bloomberg - Are you a robot?</title>\n    <meta name="viewport" content="width=device-width, initial-scale=1">\n    <link rel="stylesheet" type="text/css" href="https://assets.bwbx.io/font-service/css/BWHaasGrotesk-55Roman-Web,BWHaasGrotesk-75Bold-Web,BW%20Haas%20Text%20Mono%20A-55%20Roman/font-face.css">\n    <style rel="stylesheet" type="text/css">\n        html, body, div, span, applet, object, iframe,\n        h1, h2, h3, h4, h5, h6, p, blockquote, pre,\n        a, abbr, acronym, address, big, cite, code,\n        del, dfn, em, img, ins, kbd, q, s, samp,\n        small, strike, strong, sub, sup, tt, var,\n        b, u, i, center,\n        dl, dt, dd, ol, ul, li,\n        fieldset, form, label, legend,\n        table, caption, tbody, tfoot, thead, tr, th, td,\n        article, aside, canvas, details, embed,\n        figure, figcaption, footer, header, hgroup,\n        menu, nav, output, ruby, section, summary,\n        time, mark, audio, video {\n            margin: 0;\n            padding: 0;\n            border: 0;\n            font-size: 100%;\n            font: inherit;\n            vertical-align: baseline;\n        }\n        /* HTML5 display-role reset for older browsers */\n        article, aside, details, figcaption, figure,\n        footer, header, hgroup, menu, nav, section {\n            display: block;\n        }\n        body {\n            line-height: 1;\n        }\n        ol, ul {\n            list-style: none;\n        }\n        blockquote, q {\n            quotes: none;\n        }\n        blockquote:before, blockquote:after,\n        q:before, q:after {\n            content: \'\';\n            content: none;\n        }\n        table {\n            border-collapse: collapse;\n            border-spacing: 0;\n        }\n\n        * {\n            box-sizing: border-box;\n        }\n\n        body {\n            background-color: #f2f2f2;\n            font-family: "BWHaasGrotesk-55Roman-Web";\n            line-height: 1.2;\n        }\n\n        .header {\n            margin: 0;\n            height: 60px;\n            width: 100%;\n            background-color: black;\n            color: white;\n            overflow-x: hidden;\n        }\n\n        .logo {\n            float: left;\n            margin: 0 20px;\n            height: 60px;\n            width: 140px;\n            background-image: url(\'data:image/svg+xml;base64,PHN2ZyBpZD0iTGF5ZXJfMSIgZGF0YS1uYW1lPSJMYXllciAxIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNTcuNzUgNDcuNjMiPjxkZWZzPjxzdHlsZT4uY2xzLTF7ZmlsbDojZmZmO308L3N0eWxlPjwvZGVmcz48dGl0bGU+Qmxvb21iZXJnX05IR193aHQ8L3RpdGxlPjxwYXRoIGNsYXNzPSJjbHMtMSIgZD0iTTgxLjczLDExMzhIMTAwLjZjMy41NywwLDYuMzIuODcsOC4yNiwyLjQ1YTkuNDUsOS40NSwwLDAsMSwzLjM3LDcuNmMwLDMuNjctMS40OCw2LTQuNTQsNy4zOXYwLjE1YzQsMS4zMyw2LjI3LDQuOSw2LjI3LDkuMjMsMCw0LjEzLTEuNTgsNy4zNC00LjE4LDkuMjgtMi4xOSwxLjU4LTUsMi4zNS04LjgyLDIuMzVIODEuNzNWMTEzOFptMTcsMTVjMiwwLDMuNTItMS4xMiwzLjUyLTMuMzdzLTEuNTMtMy4yNi0zLjU3LTMuMjZIOTIuMTlWMTE1M2g2LjUzWm0xLDE0Ljg5YTMuOTMsMy45MywwLDEsMC0uMDUtNy44NUg5Mi4xOXY3Ljg1aDcuNVoiIHRyYW5zZm9ybT0idHJhbnNsYXRlKC04MS43MyAtMTEzNy45OCkiLz48cGF0aCBjbGFzcz0iY2xzLTEiIGQ9Ik0xMTUuOCwxMTM4aDkuODl2MzguNDVIMTE1LjhWMTEzOFoiIHRyYW5zZm9ybT0idHJhbnNsYXRlKC04MS43MyAtMTEzNy45OCkiLz48cGF0aCBjbGFzcz0iY2xzLTEiIGQ9Ik0xMjcuNjksMTE2Mi43N2MwLTguNjcsNS42MS0xNC41NCwxNC4yOC0xNC41NHMxNC4xOCw1Ljg3LDE0

Which is also not the output that was expected as mentioned in the URL link.

Thank you in advance [1]: JSONDecodeError: Expecting value: line 1 column 1 (char 0)

rsc05
  • 3,626
  • 2
  • 36
  • 57
  • your code works for me - i.e. I am not able to reproduce the error. – buran Dec 02 '20 at 20:11
  • @buran how come? Are you using Python 3 and above? – rsc05 Dec 02 '20 at 20:28
  • yes, 3.7. I get the JSON response as expected. Is it possible that Bloomberg site block you? Why not print response status code and `response.text` to see what you get?\ – buran Dec 02 '20 at 20:35
  • @buran still same result as \n\n\n Bloomberg - Are you a robot? I am not sure if this means that Bloomberg blocked me. Why would that be the case aren't these data publicly available? – rsc05 Dec 02 '20 at 20:43
  • Not every site likes to be scraped. Bloomberg definitely will not be one good to scrapers. It's the core of their business after all. – buran Dec 02 '20 at 20:45

0 Answers0