I'm having a unicode encode error with the following code for a simple web scraper.
print 'JSON scraper initializing'
from bs4 import BeautifulSoup
import json
import requests
import geocoder
# Set page variable
page = 'https://www.bandsintown.com/?came_from=257&page='
urlBucket = []
for i in range (1,3):
uniqueUrl = page + str(i)
urlBucket.append(uniqueUrl)
# Build response container
responseBucket = []
for i in urlBucket:
uniqueResponse = requests.get(i)
responseBucket.append(uniqueResponse)
# Build soup container
soupBucket = []
for i in responseBucket:
individualSoup = BeautifulSoup(i.text, 'html.parser')
soupBucket.append(individualSoup)
# Build events container
allSanFranciscoEvents = []
for i in soupBucket:
script = i.find_all("script")[4]
eventsJSON = json.loads(script.text)
allSanFranciscoEvents.append(eventsJSON)
with open("allSanFranciscoEvents.json", "w") as writeJSON:
json.dump(allSanFranciscoEvents, writeJSON, ensure_ascii=False)
print ('end')
The odd thing is the sometimes, this code works, and doesn't give an error. It has to do with the for i in range
line of the code. For example, if I put in (2,4)
for the range, it works fine. If I change it to 1,3,
it reads:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 12: ordinal not in range(128)
Can anyone tell me how to fix this issue within my code? If I print allSanFranciscoEvents
, it is reading in all the data, so I believe the issue is happening in the final piece of code, with the JSON dump. Thanks so much.