1

I am trying to write a python script that is going to load the tables that I created in pyhton using SQL and populate them with data automatically that is coming from a text file. I am stuck on basic coding. I do have a general idea but I am getting errors when I try to run this approach. I have created 2 tables. I have read the file. the file is a comma seperated text file with no headers.

first 3 lines of the file looks like this.

+ ---- + ----- + -------------------- + -------- + - + --- + ----- +
| John | Smith | 111 N. Wabash Avenue | plumber  | 5 | 1.0 | 200   |
| John | Smith | 111 N. Wabash Avenue | bouncer  | 5 | 1.0 | 200   |
| Jane | Doe   | 243 S. Wabash Avenue | waitress | 1 | 5.0 | 10000 |
+ ---- + ----- + -------------------- + -------- + - + --- + ----- +

import sqlite3
conn= sqlite3.connect('csc455.db')
c = conn.cursor()

#Reading the data file
fd = open ('C:/Users/nasia/Documents/data_hw2.txt','r')
data = fd.readlines()

#Creating Tables
>>> L = """create table L
... (first text, last text, address text, job text, LNum integer,
... constraint L_pk
... primary key(first, last, address, job),
... constraint L_fk
... foreign key (LNum) references LN(LNum)
... );"""
>>> c.execute(L)

LN = """create table LN
... (
... LNum integer, Interest float, Amount, Integer,
... constraint LN_pk
 ... primary key (LNum)
... );"""
 c.execute(LN)

#Inserting into database
for elt in data:
...     currentRow = elt.split(", ")[:-1]
...     insert = """(insert into LN values (%s, %s, %s);, %(currentRow[4], currentRow[5], currentRow[6]))"""
...     c.execute(insert)

There is some syntax error here. The code stops working. I cannot figure out what I am doing wrong. The error is Traceback (most recent call last): File "", line 4, in OperationalError: near "(": syntax error

I can not figure out what am I doing wrong

KindaTechy
  • 1,041
  • 9
  • 25
nasia jaffri
  • 803
  • 2
  • 12
  • 21
  • 1
    What does `data_hw2.txt` look like? What's your database schema? How do you want to map from one to the other? – abarnert Oct 06 '13 at 03:48
  • In the future, it really helps to know which statement raised the error instead of making us guess! – abarnert Oct 07 '13 at 17:36
  • As a side note, you don't have to put `;` on individual SQL statements passed to `execute`; that's only required when you're using the command-line tool, or executing a SQL script. – abarnert Oct 07 '13 at 17:47

2 Answers2

3

If you can use standard sqlite3 utility, you can do it much easier:

sqlite3 -init mydata.sql mydatabase.db ""

simply call this line from your python script, and you're done.

This will read any text file that contains valid SQL statements, and will create mydatabase.db if it did not exist. What's more important, it supports statements spanning more than one line, and also properly ignores SQL comments using both --comment syntax and C/C++ like /*comment*/ syntax.

Typically your mydata.sql content should look like this:

BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS table1 (
    id INTEGER PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(32)
);
INSERT INTO table1 (name) VALUES
('John'),
('Jack'),
('Jill');
-- more statements ...
COMMIT;
mvp
  • 111,019
  • 13
  • 122
  • 148
  • I am very new to Python and sqlite3. We can not manually insert the values, but put it in a loop. Something like this: for elt in data: currentRow = elt.split(',')[:-1] #insert for tableA using first 3 entries normalizedTableA = [currentRow[0], currentRow[1], currentRow[3]) newInsert = "INSERT INTO TableA VALUES ('%s', '%s', '%s');" % (normalizedTableA[0], normalizedTableA[1], normalizedTableA[2]) # insert for tableB using 4th and 5th entry normalizedTableB = [currentRow[4], currentRow[5]) newInsert = "INSERT INTO TableB VALUES ('%s', '%s');" % (normalizedTableB[0],normalizedTableB[1]) – nasia jaffri Oct 06 '13 at 12:19
  • I have explained the question in detail now. Can you please have a look and help me. Thanks so much for your input yesterday. – nasia jaffri Oct 07 '13 at 14:18
  • The point of the exercise is that they have data in some other format (in this case, a weird CSV dialect) and need to load it into the database. Generating the SQL statements is the hard part, not executing them. – abarnert Oct 07 '13 at 17:48
3

You haven't explained what format the data are in, or what your table structure is, or how you want to map them, which makes this difficult to answer. But I'll make up my own, and answer that, and hopefully it will help:

infile.txt:

CommonName,Species,Location,Color
Black-headed spider monkey,Ateles fusciceps,Ecuador,black
Central American squirrel monkey,Saimiri oerstedii,Costa Rica,orange
Vervet,Chlorocebus pygerythrus,South Africa,white

script.py

import csv
import sqlite3

db = sqlite3.connect('outfile.db')
cursor = db.cursor()
cursor.execute('CREATE TABLE Monkeys (Common Name, Color, Species)')
cursor.execute('''CREATE TABLE MonkeyLocations (Species, Location,
                  FOREIGN KEY(Species) REFERENCES Monkeys(Species))''')
with open('infile.txt') as f:
    for row in csv.DictReader(f):
        cursor.execute('''INSERT INTO Monkeys 
                          VALUES (:CommonName, :Color, :Species)''', row)
        cursor.execute('''INSERT INTO MonkeyLocations 
                          VALUES (:Species, :Location)''', row)
db.commit()
db.close()

Of course if your real data are in some other format than CSV, you'll use different code to parse the input file.

I've also made things slightly more complex than your real data might have to deal with—the CSV columns don't have quite the same names as the SQL columns.

In other ways, your data might be more complex—e.g., if your schema has foreign keys that reference an auto-incremented row ID instead of a text field, you'll need to get the rowid after the first insert.

But this should be enough to give you the idea.


Now that you've shown more details… you were on the right track (although it's wasteful to call readlines instead of just iterating over fd directly, and you should close your db and file, ideally with a with statement, …), but you've got a simple mistake right near the end that prevents you from getting any farther:

insert = """(insert into LN values (%s, %s, %s);, %(currentRow[4], currentRow[5], currentRow[6]))"""
c.execute(insert)

You've put the formatting % expression directly into the string, instead of using the operator on the string. I think what you were trying to do is:

insert = """insert into LN values (%s, %s, %s);""" % (currentRow[4], currentRow[5], currentRow[6])
c.execute(insert)

However, you shouldn't do that. Instead, do this:

insert = """insert into LN values (?, ?, ?);"""
c.execute(insert, (currentRow[4], currentRow[5], currentRow[6]))

What's the difference?

Well, the first one just inserts the values into the statement as Python strings. That means you have to take care of converting to the proper format, quoting, escaping, etc. yourself, instead of letting the database engine decide how to deal with each value. Besides being a source of frustrating bugs when you try to save a boolean value or forget to quote a string, this also leaves you open to SQL injection attacks unless you're very careful.

There are other problems besides that one. For example, most databases will try to cache repeated statements, and it's trivial to tell that 3000 instances of insert into LN values (?, ?, ?) are all the same statement, but less so to tell that insert into LN values (5, 1.0, 200) and insert into LN values (1, 5.0, 5000) are the same statement.

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • ".txt format" doesn't tell you nearly enough about how to parse it. It's some kind of text. Is each record a line, with the fields separated by commas, with a header line? Is each record separated by blank lines, with the fields as Key: Value lines? Or…? There are infinite possibilities here, and you have to know what you have before you can read it in. – abarnert Oct 06 '13 at 03:57
  • I am given following table: (First, Last, Address, Job, LNumber, Amount, Interest) with PK as (First, Last, Jon, LNumber. The FDs are First, Last determines Address. LNumber determines Amount, Interest. I have to decompose it to 3NF and then write python to load the tables and populate them with the data in text file, either through dictionaries or for loop. first 3 lines of the data are like this. John, Smith, 111 N. Wabash Avenue, plumber, 5, 1.0, 200 John, Smith, 111 N. Wabash Avenue, bouncer, 5, 1.0, 200 Jane, Doe, 243 S. Wabash Avenue, waitress, 1, 5.0, 10000. Please help. – nasia jaffri Oct 06 '13 at 12:05
  • @abarnet, Thanks for helping me yesterday. I am new to the site, so d=did not state my question clear enough. Can you please have a look at the code and point out where I am wrong. Thanks. – nasia jaffri Oct 07 '13 at 14:17