4

I'm trying to import a database into R to transform and load back into a different database. I am pulling my set from an RDS Mysql with:

con <- dbConnect(MySQL(),
                   user = 'user',
                   password = 'password',
                   host = 'url',
                   dbname='dbName')

sqlcmd = paste("SELECT * FROM dbName.`users`");

contentTable = dbGetQuery(con,sqlcmd);

contentTable["first_name"]

which nets me this unfortunate output

  first_name
1     Sergio
2       Sara
3   J\xfalia
4    Tatiana
5      Paula

My problem is that the third name should be coming back as Júlia. This problem occurred in other rows as well.

My locale is set as follows.

> Sys.getlocale()
[1] "pt_PT.UTF-8/pt_PT.UTF-8/pt_PT.UTF-8/C/pt_PT.UTF-8/en_US.UTF-8"

and the default character for the server is

# Variable_name, Value
'character_set_client', 'utf8'
'character_set_connection', 'utf8'
'character_set_database', 'utf8'
'character_set_filesystem', 'binary'
'character_set_results', 'utf8'
'character_set_server', 'latin1'
'character_set_system', 'utf8'
'character_sets_dir', '/rdsdbbin/oscar-5.6.10a.14.15/share/charsets/'

I am a bit lost and about to switch to Python/Panda (which gave me the correct characters but I have a bit of a learning curve to face for what I want to do next). Any idea what to do now?

Update1:

SHOW CREATE TABLE users;
CREATE TABLE `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `first_name` varchar(255) NOT NULL,
  `last_name` varchar(255) NOT NULL,
  `email` varchar(255) NOT NULL,
  `birthday` date DEFAULT NULL,
  `age` int(11) DEFAULT NULL,
  `city` varchar(255) DEFAULT NULL,
  `state` varchar(255) DEFAULT NULL,
  `country` varchar(255) DEFAULT NULL,
  `beer_points` int(11) DEFAULT NULL,
  `access_token` text,
  `created_at` datetime NOT NULL,
  `updated_at` datetime NOT NULL,
  `profile_picture_file_name` varchar(255) DEFAULT NULL,
  `profile_picture_content_type` varchar(255) DEFAULT NULL,
  `profile_picture_file_size` int(11) DEFAULT NULL,
  `profile_picture_updated_at` datetime DEFAULT NULL,
  `role` varchar(255) DEFAULT NULL,
  `password_digest` varchar(255) DEFAULT NULL,
  `gender` varchar(255) DEFAULT NULL,
  `share_code` varchar(255) DEFAULT NULL,
  `privacy_enabled` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  UNIQUE KEY `index_users_on_email` (`email`),
  KEY `index_users_on_role` (`role`)
) ENGINE=InnoDB AUTO_INCREMENT=46 DEFAULT CHARSET=latin1
brunoban
  • 177
  • 3
  • 12
  • I can't recreate your issue after copying and pasting Júlia and inserted the name into a MySQL database. After remotely connected with R using RMySQL for a dataframe import, Júlia imported just fine. Does Júlia (with accent) show like this in MySQL? Also, check the character set of your database table: `SHOW CREATE TABLE users`. – Parfait Nov 27 '15 at 20:08
  • @Parfait It does show as Júlia in MySQL (Workbench and Sequel Pro). I added the information you asked. Thank you! – brunoban Nov 28 '15 at 19:50
  • It may be your UTF-8 system locale which I believe is Portuguese and attempting to read in latin1. Look into managing your [sys.locale()](http://stackoverflow.com/questions/23324872/rstudio-not-picking-the-encoding-im-telling-it-to-use-when-reading-a-file) and/or changing [it](http://stackoverflow.com/questions/16347731/how-to-change-the-locale-of-r-in-rstudio). – Parfait Nov 29 '15 at 02:15
  • @Parfait It was in English US before, and changing it to Portuguese was one of the things I did to try and get it to work. I'm currently doing it in Python which is doing ok I guess, but I really would prefer R =( Thanks a lot for the help Parfait – brunoban Nov 30 '15 at 13:11

1 Answers1

1

This code might be useful for your problem:

 con <- dbConnect(MySQL(),
               user = 'user',
               password = 'password',
               host = 'url',
               dbname='dbName')
 m <- dbGetQuery(con, "SET NAMES 'latin1'")
 sqlcmd <- paste("SELECT * FROM dbName.`users`");
 result <- dbGetQuery(con, sqlcmd)
 dbDisconnect(con)
mrina713
  • 429
  • 3
  • 10