11

I use RMySQL and a MySQL database to store my datasets. Sometimes data gets revised or I store results back to the database as well. Long story short, there is quite some interaction between R and the database in my use case.

Most of the time I use convenience functions like dbWriteTable and dbReadTableto write and read my data. Unfortunately these are just completely ignoring R data types and the MySQL field types. I mean I would expect that MySQL date fields end up in a Date or POSIX class. The other way around I´d think that these R classes are stored as a somewhat corresponding MySQL field type. That means a date should not be character – I do not expect to distinguish between float and doubles here...

I also tried to use dbGetQuery – same result there. Is there something I have completely missed when reading the manual or is it simply not possible (yet) in these packages? What would by a nice work around?

EDIT: @mdsummer I tried to find something more in the documentation, but found only these disappointing lines: `MySQL tables are read into R as data.frames, but without coercing character or logical data into factors. Similarly while exporting data.frames, factors are exported as character vectors.

Integer columns are usually imported as R integer vectors, except for cases such as BIGINT or UNSIGNED INTEGER which are coerced to R's double precision vectors to avoid truncation (currently R's integers are signed 32-bit quantities).

Time variables are imported/exported as character data, so you need to convert these to your favorite date/time representation.

Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
  • Have you exhausted the documentation for those functions? There's quite a bit of detail in regards to "field.types" for the read and write functions, and how these are inferred using dbDatatype. – mdsumner Feb 23 '11 at 13:57
  • "exhausted" well put. I doubt that I really really have done that, but a good pointer often helps to do better at that. thx! – Matt Bannert Feb 23 '11 at 14:07
  • 1
    Any chance of a concrete example with some code? – Spacedman Feb 23 '11 at 14:20
  • @spacedman: `mytable <- getDbTable(con,"mytable")` :). returns character classes instead of Date or POSIXlt. It's more a general question, I am not so sure a) if is missed something on the manual like mdsummer suggested, b) or if i should start coding (in that case I ask: where to start). – Matt Bannert Feb 23 '11 at 15:03
  • 1
    @matt-bannert 4 years later `RMySQL::dbReadTable` still returns warnings when converting data types such as "Unsigned INTEGER in col 5 imported as numeric." Since you posted your answer below, did you find a nice way to describe type conversion from MySQL to a R data frame? I have tried to look at `DBI::dbReadTable` the conversion of column names is documented but not the conversion of types. – Paul Rougieux Jan 16 '15 at 08:33
  • I am sorry, not so far. I rather switched to postgreSQL and the corresponding `RPostgreSQL` package. I just learned that `RMySQL` has two well known new maintainers: http://blog.rstudio.org/2015/01/09/rmysql-0-1-0/ . Maybe something new is coming up with their releases ... – Matt Bannert Jan 17 '15 at 00:25

2 Answers2

5

Ok, I got a working solution now. Here's a function that maps MySQL field types to R classes. This helps in particular handling the MySQL field type date...

dbReadMap <- function(con,table){
    statement <- paste("DESCRIBE ",table,sep="")
    desc <- dbGetQuery(con=con,statement)[,1:2]

  # strip row_names if exists because it's an attribute and not real column
  # otherweise it causes problems with the row count if the table has a row_names col
  if(length(grep(pattern="row_names",x=desc)) != 0){
  x <- grep(pattern="row_names",x=desc)
  desc <- desc[-x,]
  }



    # replace length output in brackets that is returned by describe
    desc[,2] <- gsub("[^a-z]","",desc[,2])

    # building a dictionary 
    fieldtypes <- c("int","tinyint","bigint","float","double","date","character","varchar","text")
    rclasses <- c("as.numeric","as.numeric","as.numeric","as.numeric","as.numeric","as.Date","as.character","as.character","as.character") 
    fieldtype_to_rclass = cbind(fieldtypes,rclasses)

    map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
    map$rclasses <- as.character(map$rclasses)
    #get data
    res <- dbReadTable(con=con,table)



    i=1
    for(i in 1:length(map$rclasses)) {
        cvn <- call(map$rclasses[i],res[,map$Field[i]])
        res[map$Field[i]] <- eval(cvn)
    }


    return(res)
}

Maybe this is not good programming practice – I just don't know any better. So, use it at your own risk or help me to improve it... And of course it's only half of it: reading. Hopefully I´ll find some time to write a writing function soon.

If you have suggestions for the mapping dictionary let me know :)

Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
1

Here is a more generic function of the function of @Matt Bannert that works with queries instead of tables:

# Extension to dbGetQuery2 that understands MySQL data types
dbGetQuery2 <- function(con,query){
    statement <- paste0("CREATE TEMPORARY TABLE `temp` ", query)
    dbSendQuery(con, statement)
    desc <- dbGetQuery(con, "DESCRIBE `temp`")[,1:2]
    dbSendQuery(con, "DROP TABLE `temp`")

    # strip row_names if exists because it's an attribute and not real column
    # otherweise it causes problems with the row count if the table has a row_names col
    if(length(grep(pattern="row_names",x=desc)) != 0){
        x <- grep(pattern="row_names",x=desc)
        desc <- desc[-x,]
    }

    # replace length output in brackets that is returned by describe
    desc[,2] <- gsub("[^a-z]","",desc[,2])

    # building a dictionary 
    fieldtypes <- c("int",        "tinyint",    "bigint",     "float",      "double",     "date",    "character",    "varchar",   "text")
    rclasses <-   c("as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.Date", "as.character", "as.factor", "as.character") 
    fieldtype_to_rclass = cbind(fieldtypes,rclasses)

    map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
    map$rclasses <- as.character(map$rclasses)
    #get data
    res <- dbGetQuery(con,query)

    i=1
    for(i in 1:length(map$rclasses)) {
        cvn <- call(map$rclasses[i],res[,map$Field[i]])
        res[map$Field[i]] <- eval(cvn)
    }

    return(res)
}
ROLO
  • 4,183
  • 25
  • 41