R: numerics coerced into characters while reading data in transposed form, so how to easily convert things back?

Question

Say I have a data file in transposed form (i.e. rows are variables, columns are observations), like this:

name    A    B    C  
gender  M    F    M  
age     25   26   27

I read the file into R: dat <- read.table(datafile, row.names=1, as.is=TRUE). Since data.frame needs values of homogeneous type in each column, the "age" row is coerced into characters.

Then I would transpose dat back to "normal" form: dat_t <- t(dat). Now "age" is a column but the values are still characters.

Now my source data is large with many rows that should be numeric interspersed among many character rows. So after the transposition how can I easily convert the types of all the columns to what they should be?

Thank you. This is my first question so I'm not very good at searching for previous answers or asking proper questions. I apologize in advance if the question is trivial or duplicated.

Related - http://stackoverflow.com/questions/17288197/reading-a-csv-file-organized-horizontally — thelatemail, Jul 26 '16 at 05:47
Thanks for that too, that's another possible way too in the link you provided. In that approach basically the data are transposed and then read again so the types are automatically corrected after the second read. — Kenneth Cheng, Jul 26 '16 at 15:29

score 1 · Accepted Answer · answered Jul 26 '16 at 05:01

type.convert is the function you're looking for, but you need to apply it to each column. In base, this leads to constructions like

data.frame(lapply(data.frame(t(dat), stringsAsFactors = FALSE), type.convert, as.is = TRUE))
##   name gender age
## 1    A      M  25
## 2    B      F  26
## 3    C      M  27

which is a little ridiculous, though it works. If you add on purrr, you can do slightly better:

library(purrr)

dat %>% t() %>% 
    data.frame(stringsAsFactors = FALSE) %>% 
    map_df(type.convert, as.is = TRUE)
## # A tibble: 3 x 3
##    name gender   age
##   <chr>  <chr> <int>
## 1     A      M    25
## 2     B      F    26
## 3     C      M    27

or with tibble (or all of dplyr), you can use as_data_frame so you don't need stringsAsFactors = FALSE:

library(tibble)

dat %>% t() %>% 
    as_data_frame() %>% 
    map_df(type.convert, as.is = TRUE)
## # A tibble: 3 x 3
##    name gender   age
##   <chr>  <chr> <int>
## 1     A      M    25
## 2     B      F    26
## 3     C      M    27

or with full dplyr,

library(dplyr)

dat %>% t() %>% 
    as_data_frame() %>% 
    mutate_all(type.convert, as.is = TRUE)
## # A tibble: 3 x 3
##    name gender   age
##   <chr>  <chr> <chr>
## 1     A      M    25
## 2     B      F    26
## 3     C      M    27

If you drop as.is = TRUE, you'll get factors instead of strings.

Thank you. So in a word the key is the "magic" type.convert, no matter in which specific way you do it:) — Kenneth Cheng, Jul 26 '16 at 15:23
Pretty much. Even if you hack it through `read.table`, `type.convert` is still getting called. — alistaire, Jul 26 '16 at 15:33

score 1 · Answer 2 · answered Jul 26 '16 at 06:09

1

Here is an option with data.table

library(data.table)
as.data.table(t(dat), keep.rownames=TRUE)[, setNames(lapply(.SD[-1], 
          type.convert), unlist(.SD[1]))]
#    name gender age
#1:    A      M  25
#2:    B      F  26
#3:    C      M  27

answered Jul 26 '16 at 06:09

akrun

874,273
37
540
662

Thank you, your answer is quite valuable too. So the key still lies in "type.convert". – Kenneth Cheng Jul 26 '16 at 15:26
@KennethCheng At present that is the only function that does change the type dynamically (to my knowledge) – akrun Jul 26 '16 at 15:27

R: numerics coerced into characters while reading data in transposed form, so how to easily convert things back?

2 Answers2