1

I'm writing a workflow that first converts a flat csv into a .db file for downstream processing. The file seems to write ok but when I try to read it again it throws errors saying that the table doesn't exist. Here is my code:

import pandas as pd
import sqlite3

# simulate some data
df=pd.DataFrame({'DATE':['01-01-2000','01-02-2000','01-03-2000','01-04-2000','01-05-2000'], 
                    'A1':[1,1,1,1,1],
                    'A2':[1,2,3,4,5],
                    'B':[2,2,3,3,2]})
df.to_csv('my_test.csv', index=False)

# function to write csv to .db file
def pandas_csv_to_db_converter(csvfile, table_name):


    df = pd.read_csv(csvfile)
    conn = sqlite3.connect(table_name)
    df.to_sql(table_name, conn, if_exists='append', index=False)

    return

pandas_csv_to_db_converter('my_test.csv', 'my_test.db')

So far so good, I have a csv and a .db file of similar sizes. So I want to read in the table to make sure its all ok.

def pandas_db_reader(db_file):

    conn = sqlite3.connect(db_file)
    c = conn.cursor()
    c.execute("SELECT name FROM sqlite_master WHERE type='table';")
    names = [tup[0] for tup in c.fetchall()]
    print(names)

    table = pd.read_sql_query("SELECT * from {}".format(db_file), conn)
    print(table)
    conn.close()
    return

out_db = 'my_test.db'
pandas_db_reader(out_db)

The string 'my_test.db' is printed as a table in a list on the console, indicating that it is a legitimate table, but then the code throws an error:

    cur.execute(*args)
sqlite3.OperationalError: no such table: my_test.db

During handling of the above exception, another exception occurred:

    cur.execute(*args)
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT * from my_test.db': no such table: my_test.db

Its not very intuitive why this error is occurring. I've looked at quite a lot of tutorials and stack overflow questions on how to do this bit its doesn't seem to run. I don't have much experience using sql so any help would be really helpful! Thanks in advance!

EDIT: typo edited :
pandas_csv_to_db_converter('my_test.csv', my_test_db.db) 

is now

pandas_csv_to_db_converter('my_test.csv', 'my_test.db')
user3062260
  • 1,584
  • 4
  • 25
  • 53
  • why not just do `pandas.DataFrame.to_sql`? – MEdwin Jun 24 '19 at 14:42
  • basically since you already have the dataframe, you can push it into sqlite tables like this: `engine = sqlalchemy.create_engine('sqlite:///my_db.sqlite') df.to_sql('Resolved', engine, if_exists='append')` – MEdwin Jun 24 '19 at 14:44
  • Thanks for your help, the csv comes from elsewhere, I was just making some test data for a working example. – user3062260 Jun 24 '19 at 15:03

1 Answers1

2

You're passing the db_file as the tablename in pandas_db_reader(). You need to pass the correct TABLE_NAME variable to SQL query below.

table = pd.read_sql_query("SELECT * from {}".format(TABLE_NAME), conn)

For whatever reason (see here) my_test_db.db is not a valid table name. Change the table name value in pandas_csv_to_db_converter and pandas_db_reader i.e.

...
df.to_sql("test", conn, if_exists='append', index=False)
...

and

...
table = pd.read_sql_query("SELECT * from {}".format("test"), conn)
...
Wytamma Wirth
  • 543
  • 3
  • 12
  • How would I find out what the TABLE_NAME is? This is basically what I thought which is why I put some lines into 'pandas_db_reader' to find out the names - I guess the list 'names' its not giving me what I thought it was! – user3062260 Jun 24 '19 at 13:29
  • FYI i already tried: c.execute("SELECT * FROM INFORMATION_SCHEMA.TABLES ") and c.execute("SELECT * FROM sys.Tables ") – user3062260 Jun 24 '19 at 13:35
  • 1
    Your table name is defined here `pandas_csv_to_db_converter('my_test.csv', my_test_db.db)`. where is the `my_test_db` object from? Note: `my_test_db.db` is not being passed as a string. – Wytamma Wirth Jun 24 '19 at 13:38
  • Ah sorry my_test_db.db is a typo on stackoverflow, it should be 'my_test.db' everywhere. I'll edit the question for consistency, its correct in my program and still fails to run. – user3062260 Jun 24 '19 at 13:56
  • Thanks for your help, this test example now works. I think the problem was starting in 'pandas_csv_to_db_converter', I was passing in full file names. If I moved into the working directory and run the whole workflow from there with just the filename without the full path it works fine! - It looks like it was using the full file path as the table name which is why I couldn't recover the table in ''pandas_db_reader", either that or maybe it was saving to working memory rather than to a file which gets deleted when the function has finished running. Anyway many thanks! – user3062260 Jun 24 '19 at 15:38
  • No problem, can you please accept my answer as correct. :) – Wytamma Wirth Jun 25 '19 at 00:07