I have a table with about 200 columns. I need to take a dump of the daily transaction data for ETL purposes. Its a MySQL DB. I tried that with Python both using pandas dataframe as well as basic write to CSV file method. I even tried to look for the same functionality using shell script. I saw one such for oracle Database using sqlplus. Following are my python codes with the two approaches:
Using Pandas:
import MySQLdb as mdb
import pandas as pd
host = ""
user = ''
pass_ = ''
db = ''
query = 'SELECT * FROM TABLE1'
conn = mdb.connect(host=host,
user=user, passwd=pass_,
db=db)
df = pd.read_sql(query, con=conn)
df.to_csv('resume_bank.csv', sep=',')
Using basic python file write:
import MySQLdb
import csv
import datetime
currentDate = datetime.datetime.now().date()
host = ""
user = ''
pass_ = ''
db = ''
table = ''
con = MySQLdb.connect(user=user, passwd=pass_, host=host, db=db, charset='utf8')
cursor = con.cursor()
query = "SELECT * FROM %s;" % table
cursor.execute(query)
with open('Data_on_%s.csv' % currentDate, 'w') as f:
writer = csv.writer(f)
for row in cursor.fetchall():
writer.writerow(row)
print('Done')
The table has about 300,000 records. It's taking too much time with both the python codes.
Also, there's an issue with encoding here. The DB resultset has some latin-1 characters for which I'm getting some errors like : UnicodeEncodeError: 'ascii' codec can't encode character '\x96' in position 1078: ordinal not in range(128).
I need to save the CSV in Unicode format. Can you please help me with the best approach to perform this task.
A Unix based or Python based solution will work for me. This script needs to be run daily to dump daily data.