Two things,
1.Since "»" is a non-ascii character python is returning the non-ascii character like so "\u00bb", hence parsing the string by splitting the text with the non-ascii code like so will work:
parse=li.get_text().split('\u00bb')
Also, you can use the re library to parse non-ascii characters like so (you will need to add the re library if you choose this path):
import re
non_ascii = li.get_text()
parse = re.split('[^\x00-\x7f]', non_ascii)
#[^\x00-\x7f] will select non-ascii characters as pointed out by Moinuddin Quadri in https://stackoverflow.com/questions/40872126/python-replace-non-ascii-character-in-string
However by doing so python will create a list of parts from the the parse but not all texts in the "li" html tag carry the "»" character (ie.the text "POZZUOLI-PROCIDA" at the end of the table on the website) so we must account for that or we'll run into some issues.
2.A dictionary may be a poor choice of data structure since the data you are parsing will have the same keys.
For example, POUZZOULI » CASAMICCIOLA, and POUZOULI » PROCIDA. COSMICCIOLA and PROCIDA will have the same key. Python will will simply overwrite/update the value of the POUZZOULI key. So POUZZOULI: CASAMICCIOLA will become POUZZOULI: PROCIDA instead of adding POUZZOULI: CASAMICCIOLA as a dictionary entry and POUZZOULI: PROCIDA as another dictionary entry.
I suggest adding each part of the parse into lists as tuples like so:
single_port= []
ports=[]
medmar_live_departures_table = list(bs.select('li.tratta'))
departure_time = []
for li in medmar_live_departures_table:
next_li = li.find_next_sibling("li")
while next_li and next_li.get("data-toggle"):
if next_li.get("class") == ["corsa-yes"]:
# departure_time.append(next_li.strong.text)
non_ascii = li.get_text()
parse = re.split('[^\x00-\x7f]', non_ascii)
# The if statement takes care of table data strings that don't have the non-ascii character "»"
if len(parse) > 1:
ports.append((parse[0], parse[1]))
else:
single_port.append(parse[0])
# This will print out your data in your desired manner
for i in ports:
print("DEPARTURE: "+i[0])
print("ARRIVAL: "+i[1])
for i in single_port:
print(i)
I also used the split method in a test code that I ran:
import requests
from bs4 import BeautifulSoup
import re
url="https://www.medmargroup.it/"
response=requests.get(url)
bs=BeautifulSoup(response.text, 'html.parser')
timeTable=bs.find('section', class_="primarystyle-timetable")
medmar_live_departures_table=timeTable.find('ul')
single_port= []
ports=[]
for li in medmar_live_departures_table.find_all('li', class_="tratta"):
parse=li.get_text().split('\u00bb')
if len(parse)>1:
ports.append((parse[0],parse[1]))
else:
single_port.append(parse[0])
for i in ports:
print("DEPARTURE: "+i[0])
print("ARRIVAL: "+i[1])
for i in single_port:
print(i)
I hope this helps!