The code is
!wget -q -O 'boroughs.html' "https://en.wikipedia.org/wiki/List_of_London_boroughs"
with open('boroughs.html', encoding='utf-8-sig') as fp:
soup = BeautifulSoup(fp,"lxml")
data = []
table = soup.find("table", { "class" : "wikitable sortable" })
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [col.text.strip() for col in cols]
data.append([col for col in cols]) # Get rid of empty values
data
I've added encoding='utf-8-sig'
to open after some research. But in the output I still see the characters \ufeff:
What puzzles me, I've even tried the hacky way with
df = df.replace(u'\ufeff', '')
after adding data to pandas dataframe
And the characters are still there.