2

I have data stored in CSV files in multiple folders that I want to load into multiple SQL tables using MySQL on an Ubuntu system. Each table and file follows this schema (the files don't have the id field):

+ ------ + -------- + -------- + --------- + ---------- +
| SPO_Id | SPO_Name | SPO_Date | SPO_Price | SPO_Amount |
+ ------ + -------- + -------- + --------- + ---------- +

Each file contains pricing and sales data for a single day. Unfortunately, the files are not named after their date; they are stored in folders that are named after the date. Here's an example diagram of what the directory looks like

      ------> 20170102 ------> prices.csv
     /
    /
Exmpl ------> 20170213 ------> prices.csv
    \
     \
      ------> 20170308 ------> prices.csv

Here is a query I've written that pulls data from a file and stores it into a table:

use pricing ; # the database I want the tables in
drop table if exists SP_2017_01_02 ;

create table SP_2017_01_02 (
    SPO_Id int not null primary key auto_increment,
    SPO_Name varchar(32),
    SPO_Date date,
    SPO_Price float,
    SPO_Amount int
);

load data local infile '/Exmpl/20170102/prices.csv'
    into table SP_2017_01_02
    fields terminated by ','
    lines terminated by '\n'
    ignore 1 lines # First line contains field name information
    (SPO_Name, SPO_Date, SPO_Price, SPO_Amount) ;

select * from SP_2017_01_02 ;

show tables ;

This query works fine for loading one table in at a time; however, because I have hundreds of tables, I need to automate this process. I've looked around on SO and here are a few things I've found:

Here is a question similar to mine, only this question references SQL Server. The answer gives a suggestion of what to do without any real substance.

This question is also very similar to mine, only this is specifically using SSIS, to which I don't have access (and the question is left unanswered)

This post suggests using control file reference, but this is for sql-loader and oracle.

Using python may be the way to go, but I've never used it before and my question seems like too complicated a problem with which to start.

This one and this one also use python, but they're just updating one table with data from one file.

I've worked a lot in SQL Server, but I'm fairly new to MySQL. Any help is greatly appreciated!

Update

I have attempted to do this using Dynamic SQL in MySQL. Unfortunately, MySQL requires the use of stored procedures to do Dynamic SQL, but it doesn't allow the function load data in a stored procedure. As @RandomSeed pointed out, this cannot be done with only MySQL. I'm going to take his advice and attempt to write a shell/python script to handle this.

I'll leave this question open until I (or someone else) can come up with a solid answer.

Community
  • 1
  • 1
KindaTechy
  • 1,041
  • 9
  • 25

2 Answers2

1

So once you have a sql query/function/script that reads a single table, which it looks like you do (or can build an equivalent one in python somewhat simply), using python to loop through the directory structure and get filenames is fairly simple. If you can somehow pass the infile '/Exmpl/20170102/prices.csv' a new csv parameter each time and call your sql script from within python, you should be good.

I don't have much time right now, but I wanted to show you how you could get those filename strings using python.

import os

prices_csvs = []
for root, dirs, files in os.walk(os.path.join('insert_path_here', 'Exmpl'):
    for f in files:
        if f == 'prices.csv':
            prices_csvs.append(os.path.join(root, f))
            break # optional, use if there only is one prices.csv in each subfolder

for csv_file in prices_csvs:
    # csv_file is a string of the path for each prices.csv
    # if you can insert it as the `infile` parameter and run the sql, you are done
    # admittedly, i don't know how to do this at the moment

os.walk goes down through each subdirectory, giving the name root to the path to that folder, listing all the directories as dirs and files as files stored there. From there it's a simple check to see if the filename matches what you're looking for, and storing it in a list if it does. Looping over the list yields strings containing the path to each prices.csv in Exmpl.

Hope that shed a very little light on how python could help

Charlie G
  • 534
  • 5
  • 16
  • Thank you for your input. As per my update above, I'm going to try this with either a shell or python script, and this may be something to get me started. – KindaTechy Mar 30 '17 at 19:46
  • 1
    If you're using a shell script, see the following on how to get python functions to return values to a shell http://stackoverflow.com/questions/2115615/assigning-value-to-shell-variable-using-a-function-return-value-from-python http://stackoverflow.com/questions/26162394/convert-a-python-data-list-to-a-bash-array – Charlie G Mar 30 '17 at 20:19
  • If you're going python, what package would you use for the sql stuff? – Charlie G Mar 30 '17 at 20:20
  • 1
    Thank you for those links--very helpful. As per your question, that is actually something that I would ask you, haha. I don't have any experience with python; any recommendations? – KindaTechy Mar 30 '17 at 21:06
  • 1
    Decided to just use python with MySQLdb module. – KindaTechy Mar 30 '17 at 23:52
  • This definitely got me started for now. Any bumps I have in the future I'll post as additional questions. Thanks for your help! – KindaTechy Mar 31 '17 at 17:17
  • Ah good! Sorry weird day yesterday, and I haven't delved into SQL that much yet. Glad you could find something. Good luck! – Charlie G Mar 31 '17 at 19:26
1

I've marked Charlie's answer as the correct answer because, although he does not fully answer the question, he gave me a great start. Below is the code for anyone who might want to see how to load csv files into MySQL. The basic idea is to dynamically construct a string in Python and then execute that string in MySQL.

#!/usr/bin/python
import os
import MySQLdb # Use this module in order to interact with SQL

# Find all the file names located in this directory
prices_csvs = []
for root, dirs, files in os.walk(os.path.join('insert_path_here', 'Exmpl'):
for f in files:
    if f == 'prices.csv':
        prices_csvs.append(os.path.join(root, f))
        break

# Connect to the MySQL database
db = MySQLdb.connect(host ="<Enter Host Here>", user = "<Enter User here>", passwd = "<Enter Password Here>", db = "<Enter Database name here>" )

# must create cursor object
cur = db.cursor()

for csv_file in prices_csvs:

    directory = "'" + csv_file + "'"    

    table = csv_file[56:64] # This extracts the name of the table from the directory

    sql_string1 = "drop table if exists SD" + table + " ;\n"

    sql_string2 = "create table SD" + table + " as \n\
    <Enter your fields here> \n\
    ); \n"

    sql_string3 = "load data local infile " + directory + " \n\
    into table TempPrices \n\
    fields terminated by ',' \n\
    lines terminated by " + repr('\n') + " \n\
    ignore 1 lines ;\n"

    # Print out the strings for debugging
    print sql_string1
    print sql_string2
    print sql_string3
    print sql_string4
    print sql_string5

    # Execute your SQL statements
    cur.execute(sql_string1)
    cur.execute(sql_string2)
    cur.execute(sql_string3)
    cur.execute(sql_string4)
    cur.execute(sql_string5)
    db.commit()

db.close()

While debugging, I found it very helpful to copy the printed SQL statement and paste it into MySQL to confirm that the strings were being constructed successfully.

KindaTechy
  • 1,041
  • 9
  • 25