9

I have a large sql file (20 GB) that I would like to convert into csv. I plan to load the file into Stata for analysis. I have enough ram to load the entire file (my computer has 32GB in RAM)

Problem is: the solutions I found online with Python so far (sqlite3) seem to require more RAM than my current system has to:

  • read the SQL
  • write the csv

Here is the code

import sqlite3
import pandas as pd

con=sqlite3.connect('mydata.sql')
query='select * from mydata'
data=pd.read_sql(query,con)
data.to_csv('export.csv')
con.close()

The sql file contains about 15 variables that can be timestamps, strings or numerical values. Nothing really fancy.

I think one possible solution could be to read the sql and write the csv file one line at a time. However, I have no idea how to do that (either in R or in Python)

Any help really appreciated!

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
  • 1
    Can you post a small (very small) subset of your sql file ? and your code in Python or R too if possible – dickoa Nov 01 '15 at 20:49
  • unfortunately I cannot post any subset. But the file contains various variables (timestamps, strings with any character (maybe some chinese as well, numbers, etc). – ℕʘʘḆḽḘ Nov 01 '15 at 20:55
  • @dickoa see my edits above. thanks! – ℕʘʘḆḽḘ Nov 01 '15 at 20:57
  • 1
    You can use the 'edit' button to paste the code on the question, I think it's better on the question. If you can't post any subset of the file then an example of sql file will help people trying to answer and help you. – dickoa Nov 01 '15 at 21:00
  • @dickoa I am trying your method, but I get the following error unicodeencodeerror: 'ascii' codec can't encode charater u'\xee' in position 1: ordinal not in range (128)... do you know what I should do? – ℕʘʘḆḽḘ Nov 02 '15 at 13:05

3 Answers3

22

You can read the SQL database in batches and write them to file instead of reading the whole database at once. Credit to How to add pandas data to an existing csv file? for how to add to an existing CSV file.

import sqlite3
import pandas as pd

# Open the file
f = open('output.csv', 'w')
# Create a connection and get a cursor
connection = sqlite3.connect('mydata.sql')
cursor = connection.cursor()
# Execute the query
cursor.execute('select * from mydata')
# Get data in batches
while True:
    # Read the data
    df = pd.DataFrame(cursor.fetchmany(1000))
    # We are done if there are no data
    if len(df) == 0:
        break
    # Let's write to the file
    else:
        df.to_csv(f, header=False)

# Clean up
f.close()
cursor.close()
connection.close()
Community
  • 1
  • 1
Till Hoffmann
  • 9,479
  • 6
  • 46
  • 64
5

Use the sqlite3 command line program like this from the Windows cmd line or UNIX shell:

sqlite3 -csv "mydata.sql" "select * from mydata;" > mydata.csv

If mydata.sql is not in the current directory use the path and on Windows use forward slashes rather than backslashes.

Alternately run sqlite3

sqlite3

and enter these commands at the sqlite prompt:

.open "mydata.sql"
.ouptut mydata.csv
.mode csv
select * from mydata;
.quit

(or put them in a file called run, say, and use sqlite3 < run .

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • thanks grothendieck but your solution is slightly too complicated for me. How to create a file with windows? Do I need admin rights for that (I dont have them). Why this solution would overcome the problems I explained above? Thanks!! – ℕʘʘḆḽḘ Nov 01 '15 at 23:15
  • 1
    You can use notepad to create the file or you can not create the file at all and just type the lines into sqlite3 manually. You don't need admin rights. – G. Grothendieck Nov 02 '15 at 00:14
2

Load the .sql file in mysql database and export it as CSV.

Commans to load mysql dump file in MySQL database.

Create a MySQL database

create database <database_name>

mysqldump -u root -p <database_name> < dumpfilename.sql

Command to export MySQL table as CSV

mysql -u root -p
use <database_name>

SELECT * INTO OUTFILE 'file.csv'
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
FROM <table_name>;
Vivek
  • 357
  • 1
  • 18