2

I have a dataframe and I would like to convert the column types. I actualy have this function :

library(dplyr)

convertDfTypes <- function(obj, types) {

  for (i in 1:length(obj)){

    FUN <- switch(types[i], character = as.character, 
                  numeric = as.numeric, 
                  factor = as.factor, 
                  integer = as.integer, 
                  POSIXct = as.POSIXct, 
                  datetime = as.POSIXct)

    name <- names(obj)[i]

    expr <- paste0("obj %<>% mutate(", name, " = FUN(", name, "))")

    eval(parse(text = expr))
  }

  return(obj)
}

myDf <- data_frame(date = seq(Sys.Date() - 4, Sys.Date(), by = 1), 
                   x = 1:5,
                   y = 6:10)

colTypes <- c("character", "character", "integer")

str(myDf)

# Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  5 obs. of  3 variables:
#   $ date: Date, format: "2015-05-11" "2015-05-12" ...
# $ x   : int  1 2 3 4 5
# $ y   : int  6 7 8 9 10

myDf %>% 
  convertDfTypes(colTypes) %>% 
  str

# Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  5 obs. of  3 variables:
#   $ date: chr  "2015-05-11" "2015-05-12" "2015-05-13" "2015-05-14" ...
# $ x   : chr  "1" "2" "3" "4" ...
# $ y   : int  6 7 8 9 10

(In a first time I used obj[,i] <- FUN(obj[,i]) but this is very unlikely to work with objects of class tbl)

It works fine even if it's slow for complex types conversion (e.g. Date/datetime) on "larges" dataframes. But I don't know if using eval(parse is a great idea for column replacement and I think the function can be improved without using a for loop.

Is there a way to apply a different function to each column, like mutate_each but using a different function for each column and not the same for all.

Do you have any ideas to improve the function ?

Thank you

Julien Navarre
  • 7,653
  • 3
  • 42
  • 69
  • You are more likely to receive an answer if you provide a minimal reproducible example. See Hadley's reply here: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Paul Rougieux May 15 '15 at 10:05

2 Answers2

2

Here is a more general way to achieve your goal of transforming column types:

Suppose you want to transform all your int columns to numeric, you can do so using one pipe:

myDf %>%  mutate_each_( funs(as.numeric(.)), names( .[,sapply(., is.integer)] ))
Nick
  • 3,262
  • 30
  • 44
0

Create data frame each column in this data frame will be of type factor

numbers <- c("2001" ,"2002" ,"2002" ,"2002" ,"2003" ,"2005")

dates_string <- c("01-01-1989","01-07-1989","01-08-1989","01-09-1989",
"01-10-1989","01-11-1989")

gender <- c("male" , "female" ,"male" , "female" , "male" , "female")

df <- data.frame(numbers = numbers , dates_string = dates_string , gender = gender)

Check the structure of data frame

str(df)

Use transmute function in dplyr package this will create a new column with your specified function and drop columns in the old data frame

library("dplyr")

df_new <- transmute( df, numbers_new = as.numeric(numbers) , 
dates_new = as.Date(dates_string) , gender_new = as.factor(gender))

Check the structure of newly created data frame

str(df_new)
Nader Hisham
  • 5,214
  • 4
  • 19
  • 35
  • I'm aware about window functions (I use `mutate` in my function) and I don't want to convert every column of my dataframe manualy. I would like to do it with a function. This is what my function actualy does but not in an optimized way. – Julien Navarre May 15 '15 at 12:33