Loading csv files into database in linux

Question

I have been scraping csv files from the web every minute and storing them into a directory.

The files are being named according to the time of retrieval:

name = 'train'+str(datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S"))+'.csv'

I need to upload each file into a database created on some remote server.

How can I do the above?

You can use scp to transfer them to the server (https://kb.iu.edu/d/agye) and then use a python script to load the transactions from csv into the database. — Jonas, Jun 11 '18 at 07:28
Possible duplicate of [How to import CSV file to MySQL table](https://stackoverflow.com/questions/3635166/how-to-import-csv-file-to-mysql-table) — SHR, Jun 11 '18 at 07:30

score 1 · Answer 1 · answered Jun 11 '18 at 07:44

You can use pandas and sqlalchemy for loading CSV into databases. I use MSSQL and my code looks like this:

import os
import pandas as pd
import sqlalchemy as sa 

server = 'your server'
database = 'your database'

for filename in os.listdir(directory): #iterate over files
    df = pandas.read_csv(filename, sep=',')
    engine = sa.create_engine('mssql+pyodbc://'+server+'/'+database+'? 
    driver=SQL+Server+Native+Client+11.0')

    tableName = os.path.splitext(filename)[0]) #removes .csv extension 

    df.to_sql(tableName, con=engine,dtype=None) #sent data to server

By setting the dtype parameter you can change the conversion of datatype (e.g. if you want smallint instead of integer, etc)

to ensure you dont write the same file/table twice I would suggest to perhaps keep a logfile in the directory, where you can log what csv files are written to the DB. and then exclude those in your for-loop.

Does the above code read through each csv file line by line? It isn't reading in mine — , Jun 15 '18 at 06:41

Loading csv files into database in linux

1 Answers1