The objective is to parse the record of executed fellons in texas AND their last statement, and insert only the Execution Number (which already grants a primary key), the year of the execution, perhaps the age, and the content of their last statement in a SQLite database, with the purpose of assigning a 0 or 1 to each row, based on the length (eloquency indicator) of the felons, and make a yearly histogram of the assigned 0s and 1s.
@baduke was very kind to help me with the parsing part, resulting in a CSV file. However, My attempts to 'transplant' the CSV to a sqlite file have failed
ATTEMPT 1, with credits to of @Brian of StackOverflow
import csv
import sqlite3
conn = sqlite3.connect('ExecDDBB.sqlite')
cur = conn.cursor()
#Execution,Offender Information,Last Statement URL,Last Name,First Name,TDCJNumber,Age,Date,Race,County,Last Statement
cur.execute('''CREATE TABLE IF NOT EXISTS RAWCSV
(Execution INTEGER PRIMARY KEY, Offender_Information TEXT UNIQUE, Last_Statement_URL TEXT UNIQUE, Last_Name TEXT, First_Name TEXT, TDCJNUmber INTEGER, Age INTEGER, Date TEXT, Race TEXT, County TEXT, Last_Statement TEXT)''')
with open ('offenders_data.csv', 'r') as f:
reader = csv.reader(f)
columns = next(reader)
query = 'insert into RAWCSV({0}) values ({1})'
query = query.format(','.join(columns), ','.join('?' * len(columns)))
for data in reader:
cur.execute(query, data)
cur.commit()
which gives me the error:
Traceback (most recent call last):
File "D:\Dropbox\Currículum Vitae\py4e\Capstone\CSVtoSQL(table9).py", line 14, in <module>
columns = next(reader)
File "C:\Users\Javier López\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4917: character maps to <undefined>
ATTEMPT 2 credit to mr. lordvan this page
import csv, uuid, dataset
import sqlite3
conn = sqlite3.connect('ExecDDBB.sqlite')
cur = conn.cursor()
#Execution,Offender Information,Last Statement URL,Last Name,First Name,TDCJNumber,Age,Date,Race,County,Last Statement
cur.execute('''CREATE TABLE IF NOT EXISTS RAWCSV
(Execution INTEGER PRIMARY KEY, Offender_Information TEXT UNIQUE, Last_Statement_URL TEXT UNIQUE, Last_Name TEXT, First_Name TEXT, TDCJNUmber INTEGER, Age INTEGER, Date TEXT, Race TEXT, County TEXT, Last_Statement TEXT)''')
def csv2dataset(fin, fout):
with open(fin, 'r') as finp:
inpdata = csv.DictReader(finp)
db = dataset.connect('sqlite:///' + fout)
table = db['RAWCSV']
for contact in inpdata:
contact['uuid'] = uuid.uuid4().hex
contact['verified'] = False
table.insert(contact)
if __name__ == '__main__':
csv2dataset('offenders_data.csv', 'ExecDDBB.sqlite')
cur.close()
which in turn gives me the following error
Traceback (most recent call last):
File "D:\Dropbox\Currículum Vitae\py4e\Capstone\CSVtoSQL(table9).py", line 23, in <module>
csv2dataset('offenders_data.csv', 'ExecDDBB.sqlite')
File "D:\Dropbox\Currículum Vitae\py4e\Capstone\CSVtoSQL(table9).py", line 17, in csv2dataset
for contact in inpdata:
File "C:\Users\Javier López\AppData\Local\Programs\Python\Python39\lib\csv.py", line 110, in __next__
self.fieldnames
File "C:\Users\Javier López\AppData\Local\Programs\Python\Python39\lib\csv.py", line 97, in fieldnames
self._fieldnames = next(self.reader)
File "C:\Users\Javier López\AppData\Local\Programs\Python\Python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4917: character maps to <undefined>
My idea is to transfer all the CSV content into a table called RAWCSV (or something similar), and then I would duplicate the table, drop the tables I'm not really interested in, and then proceed to clip the date to get just the year, and assign the length indicator. Afterwards I would run another program to create the histogram.
Any suggestions, or help with the debugging, please?