2

i am scraping into certain webpage using requests and beautifulsoup libs in python

so i got the element that i want in this simple code

<script>
data = {'user':{'id':1,'name':'joe','age':18,'email':'joe@hotmail.com'}}
</script>

so i want to get the email value in variable but the whole element comes back into list and when i specify the text of that tag i can't get it into json it gives me errors in the columns so any idea ? i'll appreciate any help

Sameh Weangy
  • 55
  • 1
  • 9

1 Answers1

1

Something simple, maybe will help you.

import json
from bs4 import BeautifulSoup

html = """
<script>
data = {'user':{'id':1,'name':'joe','age':18,'email':'joe@hotmail.com'}}
</script>
"""

soup = BeautifulSoup(html, 'html.parser')
# slices [7:] mean that we ignore the `data = `
# and replace the single quotes to double quotes for json.loads()
json_data = json.loads(soup.find('script').text.strip()[7:].replace("'", '"'))
print(json_data)
print(type(json_data))

Output

{'user': {'id': 1, 'name': 'joe', 'age': 18, 'email': 'joe@hotmail.com'}}
<class 'dict'>
Druta Ruslan
  • 7,171
  • 2
  • 28
  • 38
  • u r getting close enough from what i want , and i also did this and gives me error in columns `code` json_data = json.loads(soup.find_all('script')[3].text.strip()[21:]) File "C:\Users\TOSHIBA\AppData\Local\Programs\Python\Python36-32\lib\json\__init__.py", line 354, in loads return _default_decoder.decode(s) File "C:\Users\TOSHIBA\AppData\Local\Programs\Python\Python36-32\lib\json\decoder.py", line 342, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 1 column 3548 (char 3547) – Sameh Weangy Jul 04 '18 at 18:08
  • can you show the `script` tag that you want to scrape ? – Druta Ruslan Jul 04 '18 at 18:10
  • am not sure if i can do it here because it's too long – Sameh Weangy Jul 04 '18 at 18:11
  • this may help you, i think you have many `dict` objects, https://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data – Druta Ruslan Jul 04 '18 at 18:15
  • actually i want to get the csrf_token value here [CSRF-TOKEN](https://prnt.sc/k2ki8i) – Sameh Weangy Jul 04 '18 at 18:18
  • 1
    That's probably because of the `;` at the end. Use something like `[7:-1]` instead of `[7:]`. Also, slightly more reliable than using such magic numbers is to get everything between the first `{` and the last `}` in the script tag content and parse it as JSON. – UltraInstinct Jul 04 '18 at 18:40
  • @UltraInstinct you are right! i check it and put a `;` at the end and it return the same error `Extra data: line 1 column 66 (char 65)` – Druta Ruslan Jul 04 '18 at 18:42
  • @UltraInstinct u r right , Thank u guys now it accepted as a json data but now i can't call the csrf_token from json or maybe i forgot the right code DrutaRuslan Man i appreciate ur help very well thanks <3 – Sameh Weangy Jul 05 '18 at 17:51