2

Say I have a data file in transposed form (i.e. rows are variables, columns are observations), like this:

name    A    B    C  
gender  M    F    M  
age     25   26   27

I read the file into R: dat <- read.table(datafile, row.names=1, as.is=TRUE). Since data.frame needs values of homogeneous type in each column, the "age" row is coerced into characters.

Then I would transpose dat back to "normal" form: dat_t <- t(dat). Now "age" is a column but the values are still characters.

Now my source data is large with many rows that should be numeric interspersed among many character rows. So after the transposition how can I easily convert the types of all the columns to what they should be?

Thank you. This is my first question so I'm not very good at searching for previous answers or asking proper questions. I apologize in advance if the question is trivial or duplicated.

Kenneth Cheng
  • 103
  • 1
  • 3
  • Related - http://stackoverflow.com/questions/17288197/reading-a-csv-file-organized-horizontally – thelatemail Jul 26 '16 at 05:47
  • Thanks for that too, that's another possible way too in the link you provided. In that approach basically the data are transposed and then read again so the types are automatically corrected after the second read. – Kenneth Cheng Jul 26 '16 at 15:29

2 Answers2

1

type.convert is the function you're looking for, but you need to apply it to each column. In base, this leads to constructions like

data.frame(lapply(data.frame(t(dat), stringsAsFactors = FALSE), type.convert, as.is = TRUE))
##   name gender age
## 1    A      M  25
## 2    B      F  26
## 3    C      M  27

which is a little ridiculous, though it works. If you add on purrr, you can do slightly better:

library(purrr)

dat %>% t() %>% 
    data.frame(stringsAsFactors = FALSE) %>% 
    map_df(type.convert, as.is = TRUE)
## # A tibble: 3 x 3
##    name gender   age
##   <chr>  <chr> <int>
## 1     A      M    25
## 2     B      F    26
## 3     C      M    27

or with tibble (or all of dplyr), you can use as_data_frame so you don't need stringsAsFactors = FALSE:

library(tibble)

dat %>% t() %>% 
    as_data_frame() %>% 
    map_df(type.convert, as.is = TRUE)
## # A tibble: 3 x 3
##    name gender   age
##   <chr>  <chr> <int>
## 1     A      M    25
## 2     B      F    26
## 3     C      M    27

or with full dplyr,

library(dplyr)

dat %>% t() %>% 
    as_data_frame() %>% 
    mutate_all(type.convert, as.is = TRUE)
## # A tibble: 3 x 3
##    name gender   age
##   <chr>  <chr> <chr>
## 1     A      M    25
## 2     B      F    26
## 3     C      M    27

If you drop as.is = TRUE, you'll get factors instead of strings.

alistaire
  • 42,459
  • 4
  • 77
  • 117
1

Here is an option with data.table

library(data.table)
as.data.table(t(dat), keep.rownames=TRUE)[, setNames(lapply(.SD[-1], 
          type.convert), unlist(.SD[1]))]
#    name gender age
#1:    A      M  25
#2:    B      F  26
#3:    C      M  27
akrun
  • 874,273
  • 37
  • 540
  • 662