1

I have a table with about 200 columns. I need to take a dump of the daily transaction data for ETL purposes. Its a MySQL DB. I tried that with Python both using pandas dataframe as well as basic write to CSV file method. I even tried to look for the same functionality using shell script. I saw one such for oracle Database using sqlplus. Following are my python codes with the two approaches:

Using Pandas:

import MySQLdb as mdb
import pandas as pd

host = ""
user = ''
pass_ = ''
db = ''

query = 'SELECT * FROM TABLE1'

conn = mdb.connect(host=host,
                   user=user, passwd=pass_,
                   db=db)

df = pd.read_sql(query, con=conn)
df.to_csv('resume_bank.csv', sep=',')

Using basic python file write:

import MySQLdb
import csv
import datetime

currentDate = datetime.datetime.now().date()

host = ""
user = ''
pass_ = ''
db = ''
table = ''

con = MySQLdb.connect(user=user, passwd=pass_, host=host, db=db, charset='utf8')
cursor = con.cursor()

query = "SELECT * FROM %s;" % table
cursor.execute(query)

with open('Data_on_%s.csv' % currentDate, 'w') as f:
    writer = csv.writer(f)
    for row in cursor.fetchall():
        writer.writerow(row)

print('Done')

The table has about 300,000 records. It's taking too much time with both the python codes.

Also, there's an issue with encoding here. The DB resultset has some latin-1 characters for which I'm getting some errors like : UnicodeEncodeError: 'ascii' codec can't encode character '\x96' in position 1078: ordinal not in range(128).

I need to save the CSV in Unicode format. Can you please help me with the best approach to perform this task.

A Unix based or Python based solution will work for me. This script needs to be run daily to dump daily data.

Cong Ma
  • 10,692
  • 3
  • 31
  • 47
amulya349
  • 1,210
  • 2
  • 17
  • 26

3 Answers3

2

You can achieve that just leveraging MySql. For example:

SELECT * FROM your_table WHERE...
INTO OUTFILE 'your_file.csv'
FIELDS TERMINATED BY ',' 
OPTIONALLY ENCLOSED BY '"'
FIELDS ESCAPED BY '\'
LINES TERMINATED BY '\n';

if you need to schedule your query put such a query into a file (e.g., csv_dump.sql) anche create a cron task like this one

00 00 * * * mysql -h your_host -u user -ppassword < /foo/bar/csv_dump.sql
floatingpurr
  • 7,749
  • 9
  • 46
  • 106
  • Instead of running SQL query, I need to run a script to do this task so that I can use a scheduler. This will help me get a good control over the scripts because the script will do other tasks too like creating directories. – amulya349 Apr 13 '18 at 09:58
  • You can schedule the query as a script! @amulya349 take a look at the edit ; ) – floatingpurr Apr 13 '18 at 09:59
  • Thanks for the answer. The SQL will work only if my MYSQL server is running on the same server as the script. But, my MYSQL DB is a cloud-managed one like of AWS. There this SQL won't do the work. What are your suggestions? – amulya349 Apr 13 '18 at 10:21
  • Hope This Helps-https://cloud.google.com/sql/docs/mysql/import-export/exporting#csv – Kedar Limaye Apr 13 '18 at 10:24
  • 1
    @floatingpurr Worked for me. Thanks a lot! – amulya349 Apr 16 '18 at 06:22
  • Good news! Feel free to accept the answer if it solved :) – floatingpurr Apr 16 '18 at 07:47
2

For strings this will use the default character encoding which happens to be ASCII, and this fails when you have non-ASCII characters. You want unicode instead of str.

rows = cursor.fetchall()
f = open('Data_on_%s.csv' % currentDate, 'w')
myFile = csv.writer(f)
myFile.writerow([unicode(s).encode("utf-8") for s in rows])
fp.close()
Jay Shankar Gupta
  • 5,918
  • 1
  • 10
  • 27
1

You can use mysqldump for this task. (Source for command)

mysqldump -u username -p --tab  -T/path/to/directory dbname table_name --fields-terminated-by=',' 

The arguments are as follows:

  • -u username for the username
  • -p to indicate that a password should be used
  • -ppassword to give the password via command line
  • --tab Produce tab-separated data files

For mor command line switches see https://dev.mysql.com/doc/refman/5.5/en/mysqldump.html

To run it on a regular basis, create a cron task like written in the other answers.

rollstuhlfahrer
  • 3,988
  • 9
  • 25
  • 38