0

I web scraped a website and inputted the data into an sql3 database. For some reason, there's this black star in front of each name that's not an asterisk. It's one character that goes in front of the name, and I don't want to go in each row of data and erase it by hand because that's tedious, plus when I update the data on the table, the stars are just gonna come back. Does anyone know how to remove the star before it goes into the table? the data I extracted is a list with dictionaries as the elements. Thank you in advance!

import requests
from bs4 import BeautifulSoup
import byebyebirdie
import json
##from  tkinter import *
##bobert=Tk()
##bobert.geometry("600x600")
import sqlite3


connection = sqlite3.connect('covidproject.db')
cursor = connection.cursor()

##cursor.execute("DROP TABLE IF EXISTS covid ")

##cursor.execute("CREATE TABLE IF NOT EXISTS covid (name STRING, confirmed REAL, changes_today REAL,deceased REAL,active REAL, recovered REAL)")

url = 'https://ncov2019.live/'
headers = {'User-Agent':'Mozilla/5.0'}
response = requests.get(url, headers = headers)
response.status_code

soup = BeautifulSoup(response.content,'html.parser')
stat_table = soup.find_all("table", attrs={"class": "display responsive"})

headers = [header.get_text(strip=True) for header in soup.find_all("th")]
rows = [dict(zip(headers, [td.get_text(strip=True) for td in row.find_all("td")]))
        for row in soup.find_all("tr")[1:-1]]

print("e")
i=9
##while 8<i<223:
##    print("h")
##    roa=rows[i]['Name']
##    rob=rows[i]['Confirmed']
##    roc=rows[i]['Changes Today']
##    rod=rows[i]['Deceased']
##    roe=rows[i]['Active']
##    rof=rows[i]['Recovered']
##    cursor.execute("INSERT INTO covid VALUES('"+roa+"','"+rob+"','"+roc+"','"+rod+"','"+roe+"','"+rof+"')")
##    connection.commit()
##    i=i+1
    
print (json.dumps(rows[9], indent=2))
print (rows[222]['Name'])

The dictionaries in the list look like this

{
  "Name": "\u2605Hong Kong",
  "Confirmed": "1,714",
  "Changes Today": "0",
  "Percentage Day Change": "0%",
  "Critical": "8",
  "Deceased": "11",
  "Percentage Death Change": "0%",
  "Tests": "442,256",
  "Active": "439",
  "Recovered": "1,264"
}

and in sqlite3 the name would look like this:★Albania

PRATHAMESH GIRI
  • 75
  • 1
  • 2
  • 11

3 Answers3

0

That comes from the \u2605 character. The best way to do is to split it from the dict.

a = {
  "Name": "\u2605Hong Kong",
  "Confirmed": "1,714",
  "Changes Today": "0",
  "Percentage Day Change": "0%",
  "Critical": "8",
  "Deceased": "11",
  "Percentage Death Change": "0%",
  "Tests": "442,256",
  "Active": "439",
  "Recovered": "1,264"
}
a['Name'] = ''.join(a['Name'].split('\u2605'))
Aleksander Ikleiw
  • 2,549
  • 1
  • 8
  • 26
  • I got an error that said "list indices must be integers or slices, not str"-do you know how i can fix that? –  Jul 18 '20 at 17:05
  • @RachelDing can you show me exactly which part of code does that error? – Aleksander Ikleiw Jul 18 '20 at 17:07
  • yea- it says "Traceback (most recent call last): File "C:\Users\minio\Downloads\sqlite-tools-win32-x86-3320300\sqlite-tools-win32-x86-3320300\webscraping3.py", line 29, in rows['Name'] = ''.join(rows['Name'].split('\u2605')) TypeError: list indices must be integers or slices, not str" –  Jul 18 '20 at 17:14
  • sure-the rows variable is a list and each element is a dictionary like the one in my question except there are hundreds of the dictionaries –  Jul 18 '20 at 17:20
  • @RachelDing I am glad tho :) If you can upvote also, please do that – Aleksander Ikleiw Jul 18 '20 at 17:50
0

or \u2605 is a unicode character. To remove unicodes from a string, encode the string as ascii and decode it back.

>>> name = "\u2605Hong Kong"
>>> name.encode('ascii', errors='ignore').decode()
'Hong Kong'
Prem Anand
  • 2,469
  • 16
  • 16
0

Since the requirement is to remove the first unicode char, you can use string manipulation to remove the first char

test = "\u2605Hong Kong"
test =test[1:]
print(test)

>>>Hong Kong

In your case the solution would be like

Name = rows[i]['Name']
rows[i]['Name'] = Name[1:]