I generate .csv files using a python script by writing a pandas DataFrame to_csv
, using utf8 encoding.
consEx.to_csv(os.path.join(base_dir, "Database/Tables/Consumption Expenditure/consumptionExpenditure.csv"), encoding = 'utf8', index = False)
Next, I upload to Amazon RDS MySQL via Load Data Local Infile
, into a table that has Charset
also set to utf8.
CREATE TABLE IF NOT EXISTS Consumer.Expenditure (
ceID INT NOT NULL AUTO_INCREMENT ,
ceCategory INT NOT NULL,
year INT NULL,
countryID INT NOT NULL,
ceYoY DEC(15,2) NULL,
dateCreated DATE NOT NULL ,
lastModified DATETIME NULL DEFAULT NULL ,
dateDeleted DATETIME NULL DEFAULT NULL ,
PRIMARY KEY (ceID) ,
CONSTRAINT ce_fk_countries
FOREIGN KEY (countryID)
REFERENCES ConsumerAlpha.Countries (countryID)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
CHARSET utf8
ENGINE = InnoDB;
SHOW WARNINGS;
Now, upload ONLY works if I first open and manually save the csv file - same name, same location. Otherwise, it fails without error - just doesn't import anything. However, if I open, save, and close, upload works perfectly.
LOAD DATA LOCAL
INFILE '/Users/xxx/Dropbox/Data/Database/Tables/Consumption Expenditure/consumptionExpenditure.csv'
INTO TABLE Consumer.Expenditure
fields terminated by ','
optionally enclosed by '"'
lines terminated by '\r'
IGNORE 1 LINES
(ceCategory, year, countryID, ceYoY, dateCreated)
SET dateCreated = NOW();
Is there anything I'm missing to avoid this step? For large tables, there may be many .csv files. I've looked around quite a bit, but haven't come across this particular issue anywhere. Thanks!