3

I have a dataframe with hundreds of columns where some columns despite having only numeric values are stored as character data type. I need to convert all the columns to numeric where values are numbers only (there might also be NAs in the data).

Example dataframe:

df <- data.frame(id = c("R1","R2","R3","R4","R5"), name=c("A","B","C","D","E"), age=c("24", "NA", "55", "19", "40"), sbp=c(174, 125, 180, NA, 130), dbp=c("106", "67", "109", "NA", "87"))

> print(df, row.names = F)
 id name age sbp dbp
 R1    A  24 174 106
 R2    B  NA 125  67
 R3    C  55 180 109
 R4    D  19  NA  NA
 R5    E  40 130  87

These columns should be numeric.
> df$age
[1] "24" "NA" "55" "19" "40"
> df$dbp
[1] "106" "67"  "109" "NA"  "87" 

I applied as.numeric() function but it also converted all the character varaibles (id, name..etc) to numeric thus the NA generated.

> sapply(df,as.numeric)
     id name age sbp dbp
[1,] NA   NA  24 174 106
[2,] NA   NA  NA 125  67
[3,] NA   NA  55 180 109
[4,] NA   NA  19  NA  NA
[5,] NA   NA  40 130  87

> lapply(df,as.numeric)
$id
[1] NA NA NA NA NA

$name
[1] NA NA NA NA NA

$age
[1] 24 NA 55 19 40

$sbp
[1] 174 125 180  NA 130

$dbp
[1] 106  67 109  NA  87

What I need to do is ignoreing the real character colums (id, names..) while looping through the dataframe. Any help is much appreciated!

Mumtaj Ali
  • 421
  • 4
  • 7

2 Answers2

3

Try type.convert():

df2 <- type.convert(df, as.is = TRUE)

Result:

#> df2
  id name age sbp dbp
1 R1    A  24 174 106
2 R2    B  NA 125  67
3 R3    C  55 180 109
4 R4    D  19  NA  NA
5 R5    E  40 130  87

## check column classes
#> sapply(df2, class)
         id        name         age         sbp         dbp 
"character" "character"   "integer"   "integer"   "integer" 

Note, the as.is argument controls whether character columns are converted to factors. i.e., if as.is= FALSE, the first two columns would have been changed to factors.

zephryl
  • 14,633
  • 3
  • 11
  • 30
  • The type.convert() function works fine with numeric and character data types but doesn't recognise "date" columns stored as character data. I also need to convert dates stored as character data. – Mumtaj Ali Jan 12 '23 at 04:59
  • 1
    Try `readr::type_convert(df)` instead. – zephryl Jan 12 '23 at 12:23
0

This is possible. It delivers again a DF

df[1:2] |> bind_cols(sapply(df[3:5], as.numeric))
# id name age sbp dbp
# R1    A  24 174 106
# R2    B  NA 125  67
# R3    C  55 180 109
# R4    D  19  NA  NA
# R5    E  40 130  87
MarBlo
  • 4,195
  • 1
  • 13
  • 27