0

I generate .csv files using a python script by writing a pandas DataFrame to_csv, using utf8 encoding.

consEx.to_csv(os.path.join(base_dir, "Database/Tables/Consumption Expenditure/consumptionExpenditure.csv"), encoding = 'utf8', index = False)

Next, I upload to Amazon RDS MySQL via Load Data Local Infile, into a table that has Charset also set to utf8.

CREATE  TABLE IF NOT EXISTS Consumer.Expenditure (
  ceID INT NOT NULL AUTO_INCREMENT ,
  ceCategory INT NOT NULL,
  year INT NULL,
  countryID INT NOT NULL,
  ceYoY DEC(15,2) NULL,
  dateCreated DATE NOT NULL ,
  lastModified DATETIME NULL DEFAULT NULL ,
  dateDeleted DATETIME NULL DEFAULT NULL ,
  PRIMARY KEY (ceID) ,
  CONSTRAINT ce_fk_countries
    FOREIGN KEY (countryID)
    REFERENCES ConsumerAlpha.Countries (countryID)
    ON DELETE NO ACTION
    ON UPDATE NO ACTION)

CHARSET utf8 
ENGINE = InnoDB;
SHOW WARNINGS;

Now, upload ONLY works if I first open and manually save the csv file - same name, same location. Otherwise, it fails without error - just doesn't import anything. However, if I open, save, and close, upload works perfectly.

LOAD DATA LOCAL 
INFILE '/Users/xxx/Dropbox/Data/Database/Tables/Consumption Expenditure/consumptionExpenditure.csv'
INTO TABLE Consumer.Expenditure
fields terminated by ','
optionally enclosed by '"'
lines terminated by '\r'
IGNORE 1 LINES
(ceCategory, year, countryID, ceYoY, dateCreated)
SET dateCreated = NOW();

Is there anything I'm missing to avoid this step? For large tables, there may be many .csv files. I've looked around quite a bit, but haven't come across this particular issue anywhere. Thanks!

Stefan
  • 41,759
  • 13
  • 76
  • 81
  • "open and manually save the csv file" ... can you describe this more? Does your editor change the file in any way? – mattexx May 21 '13 at 17:24
  • possible duplicate of [How to insert pandas dataframe via mysqldb into database?](http://stackoverflow.com/questions/16476413/how-to-insert-pandas-dataframe-via-mysqldb-into-database) on seconds thoughts, it's perhaps the opposite, nevertheless related. – Andy Hayden May 21 '13 at 18:03

1 Answers1

2

Considering skipping the CSV step by using pandas.io.sql.write_frame.

import MySQLdb
conn = Mysqldb.conn(host='...', user='...', passwd='...',db='...')

import pandas.io.sql as psql
psql.write_frame(dataframe, 'TableName', conn, 'mysql')

If you are appending to an existing table, use the keyword argument if_exists='append'.

Edit: I forgot that @Andy wrote essentially the same answer awhile back: https://stackoverflow.com/a/16477603/1221924. As he notes, we added MySQL support but neglected to document it.

Community
  • 1
  • 1
Dan Allan
  • 34,073
  • 6
  • 70
  • 63