0

I use the Python requests library and get a bunch of unicode characters in return. I'm trying to use beautiful soup to parse through it and return specific fields but am having trouble. Below is what I've done so far, could someone give me some pointers:

 api_call = "https://widgets.mindbodyonline.com/widgets/schedules/8d3262c705.json?mobile=false&version=0.1"
    page = requests.get(api_call)
    page.encoding = 'utf-8'
    soup = BeautifulSoup(page.text,'html.parser')

The issue is that I am having trouble parsing because the output (soup) is in a weird format that isn't html but rather this:

{"contents":"  \u003clink rel=\"stylesheet\" media=\"screen\" href=\"https://assets.healcode.com/assets/widgets/healcode-a7a1803da55d050a9f4eae1dd980c68f4ab2e281010432da93b03a7c507ff5ee.css\" /\u003e\n  \n  \u003cstyle type=\"text/css\"\u003e\n  /* Add custom CSS here */    \n   div.healcode, div.healcode table.schedule tr td, div.healcode table.schedule tr td ol.schedule_list li { color: #4D4D4D !important; }  div.healcode a { color: #4D4D4D !important; }  div.healcode table.schedule tr.odd td, div.healcode div.list_view li.odd, div.healcode table.list_view ol.schedule_list li.odd { background-color: #F7F7F7 !important; }  div.healcode table.schedule tr.even td, div.healcode div.list_view li.even, div.healcode table.list_view ol.schedule_list li.even { background-color: #ffffff !important; }  div.healcode.schedule h1 { color: #4D4D4D !important; }  div.healcode .classname a { color: #4D4D4D !important; }  div.healcode table.schedule tr th, div.healcode table.schedule ol.schedule_list li.schedule_header { background-color: #6985B2 !important; }  div.healcode table.schedule tr th, div.healcode table.schedule ol.schedule_list li.schedule_header { color: #ffffff !important; }  div.healcode .week_links a, div.healcode .week_links a:visited, div.healcode .day_links a, div.healcode .day_links a:visited, div.healcode a.hc-button, div.healcode input.hc-button, div.healcode .healcode-date-links-area a { background: #EBA900 !important; }  div.healcode .week_links a, div.healcode .week_links a:visited, div.healcode .day_links a, div.healcode .day_links a:visited, div.healcode a.hc-button, div.healcode input.hc-button, div.healcode .healcode-date-links-area a { color: #ffffff !important; }  div.healcode .week_links a:hover, div.healcode .day_links a:hover, div.healcode a.hc-button:hover, div.healcode input.hc-button a:hover, div.healcode .healcode-date-links-area a:hover { background: #F9CD58 !important; }  div.healcode .week_links a:hover, div.healcode .day_links a:hover, div.healcode a.hc-button:hover, div.healcode input.hc-button a:hover, div.healcode .healcode-date-links-area a:hover { color: #ffffff !important; } \n  \u003c/style\u003e\n  \u003cstyle type=\"text/css\"\u003e\n    .hc_footer { text-align: right; }\n    .hc-privacy-footer { margin-top: 1em; }\n    .hc_footer img, .hc-privacy-footer img { border:none; height: 2.0em; }\n    .hc-privacy-
  • All web data is Unicode data. BeautifulSoup handles decoding transparently for you, or you can pass in the expected encoding. Why are you creating the soup *twice*? – Martijn Pieters Mar 12 '20 at 22:43
  • See [retrieve links from web page using python and BeautifulSoup](https://stackoverflow.com/a/22583436) my advice on how to handle BeautifulSoup and encodings. – Martijn Pieters Mar 12 '20 at 22:44
  • Your link isn't to an html page, but a json, with a partial html residing inside it's 'contents' field. – Boris Lipschitz Mar 12 '20 at 22:53
  • 1
    BeautifulSoup(json.loads(page.content)['contents'],'html.parser') - this will give you a usable soup object. And yea, remove the "page.encoding = 'utf-8'", you don't need that. – Boris Lipschitz Mar 12 '20 at 22:54

0 Answers0