Parsing a webstie's HTML table to create a SQLite Database, using Python (or migrating a CSV file)

Question

The objective is to parse the record of executed fellons in texas AND their last statement, and insert only the Execution Number (which already grants a primary key), the year of the execution, perhaps the age, and the content of their last statement in a SQLite database, with the purpose of assigning a 0 or 1 to each row, based on the length (eloquency indicator) of the felons, and make a yearly histogram of the assigned 0s and 1s.

@baduke was very kind to help me with the parsing part, resulting in a CSV file. However, My attempts to 'transplant' the CSV to a sqlite file have failed

ATTEMPT 1, with credits to of @Brian of StackOverflow

import csv
import sqlite3

conn = sqlite3.connect('ExecDDBB.sqlite')
cur = conn.cursor()


#Execution,Offender Information,Last Statement URL,Last Name,First Name,TDCJNumber,Age,Date,Race,County,Last Statement
cur.execute('''CREATE TABLE IF NOT EXISTS RAWCSV
            (Execution INTEGER PRIMARY KEY, Offender_Information TEXT UNIQUE, Last_Statement_URL TEXT UNIQUE, Last_Name TEXT, First_Name TEXT, TDCJNUmber INTEGER, Age INTEGER, Date TEXT, Race TEXT, County TEXT, Last_Statement TEXT)''')

with open ('offenders_data.csv', 'r') as f:
    reader = csv.reader(f)
    columns = next(reader) 
    query = 'insert into RAWCSV({0}) values ({1})'
    query = query.format(','.join(columns), ','.join('?' * len(columns)))
    for data in reader:
        cur.execute(query, data)
cur.commit()

which gives me the error:

Traceback (most recent call last):
  File "D:\Dropbox\Currículum Vitae\py4e\Capstone\CSVtoSQL(table9).py", line 14, in <module>
    columns = next(reader)
  File "C:\Users\Javier López\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4917: character maps to <undefined>

ATTEMPT 2 credit to mr. lordvan this page

import csv, uuid, dataset
import sqlite3

conn = sqlite3.connect('ExecDDBB.sqlite')
cur = conn.cursor()


#Execution,Offender Information,Last Statement URL,Last Name,First Name,TDCJNumber,Age,Date,Race,County,Last Statement
cur.execute('''CREATE TABLE IF NOT EXISTS RAWCSV
            (Execution INTEGER PRIMARY KEY, Offender_Information TEXT UNIQUE, Last_Statement_URL TEXT UNIQUE, Last_Name TEXT, First_Name TEXT, TDCJNUmber INTEGER, Age INTEGER, Date TEXT, Race TEXT, County TEXT, Last_Statement TEXT)''')

def csv2dataset(fin, fout):
    with open(fin, 'r') as finp:
        inpdata = csv.DictReader(finp)
        db = dataset.connect('sqlite:///' + fout)
        table = db['RAWCSV']
        for contact in inpdata:
            contact['uuid'] = uuid.uuid4().hex
            contact['verified'] = False
            table.insert(contact)

if __name__ == '__main__':
    csv2dataset('offenders_data.csv', 'ExecDDBB.sqlite')
    
cur.close()

which in turn gives me the following error

Traceback (most recent call last):
  File "D:\Dropbox\Currículum Vitae\py4e\Capstone\CSVtoSQL(table9).py", line 23, in <module>
    csv2dataset('offenders_data.csv', 'ExecDDBB.sqlite')
  File "D:\Dropbox\Currículum Vitae\py4e\Capstone\CSVtoSQL(table9).py", line 17, in csv2dataset
    for contact in inpdata:
  File "C:\Users\Javier López\AppData\Local\Programs\Python\Python39\lib\csv.py", line 110, in __next__
    self.fieldnames
  File "C:\Users\Javier López\AppData\Local\Programs\Python\Python39\lib\csv.py", line 97, in fieldnames
    self._fieldnames = next(self.reader)
  File "C:\Users\Javier López\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4917: character maps to <undefined>

My idea is to transfer all the CSV content into a table called RAWCSV (or something similar), and then I would duplicate the table, drop the tables I'm not really interested in, and then proceed to clip the date to get just the year, and assign the length indicator. Afterwards I would run another program to create the histogram.

Any suggestions, or help with the debugging, please?

Does this answer your question? [Converting a csv file into a list of tuples with python](https://stackoverflow.com/questions/18776370/converting-a-csv-file-into-a-list-of-tuples-with-python) — Sabito stands with Ukraine, Nov 28 '20 at 08:00

score 0 · Answer 1 · answered Nov 23 '20 at 21:57

If you're flexible with 3rd party packages, try pandas.

import pandas as pd
df = pd.read_csv(...)
df.to_sql(...)

See the official doc:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

Parsing a webstie's HTML table to create a SQLite Database, using Python (or migrating a CSV file)

1 Answers1