0

I have a data frame and trying to convert the factor to integer levels for the purpose of assigning different values to integer columns and summing up. The data frame is below

SYMBOL             PVALUE1             PVALUE2             PVALUE3
1      <NA>                   0                   0                   0
2      A1BG   0.570568468332992   0.638946096507843   0.500000024038353
3  A1BG-AS1   0.934667427815304   0.535423659692795   0.761546785262885
4      A1CF  0.0547554690106211  0.0774299636523817   0.065332572184696
5       A2M 0.00167287722066533 0.00167287722066533 0.00221961024320294
6   A2M-AS1  0.0029235940742491 0.00167287722066533 0.00167287722066533
7     A2ML1  0.0911572830792113   0.106611564406736  0.0911572830792113
8     A2MP1   0.361053903492157   0.361053903492157   0.266847309723377
9    A4GALT   0.120560747291588  0.0774299636523817  0.0911572830792113
10    A4GNT   0.123873163210698  0.0911572830792113  0.0911572830792113
11     AA06   0.143001729123466   0.106611564406736   0.143001729123466

str(df)
    'data.frame':   21761 obs. of  4 variables:
     $ SYMBOL : chr  NA "A1BG" "A1BG-AS1" "A1CF" ...
     $ PVALUE1: Factor w/ 209 levels "0","0.000109570493049298",..: 1 145 181 74 22 27 84 126 93 94 ...
     $ PVALUE2: Factor w/ 216 levels "0","0.000109570493049298",..: 1 157 147 86 20 20 93 134 86 90 ...
     $ PVALUE3: Factor w/ 207 levels "0","0.000126830713374967",..: 1 140 161 81 25 22 89 118 89 89 ...

I try the following code given here : Convert factor to integer in a data frame

df1 <- sapply(df, function(x) if(is.factor(df)) {
  as.numeric(as.character(df))
} else {
  df
})

But its does not seem to work, give me output like this

 SYMBOL          PVALUE1         PVALUE2         PVALUE3        
SYMBOL  Character,21761 Character,21761 Character,21761 Character,21761
PVALUE1 factor,21761    factor,21761    factor,21761    factor,21761   
PVALUE2 factor,21761    factor,21761    factor,21761    factor,21761   
PVALUE3 factor,21761    factor,21761    factor,21761    factor,21761

When i try str(df1)

List of 16
 $ : chr [1:21761] "0" "A1BG" "A1BG-AS1" "A1CF" ...
 $ : Factor w/ 209 levels "0","0.000109570493049298",..: 1 145 181 74 22 27 84 126 93 94 ...
 $ : Factor w/ 216 levels "0","0.000109570493049298",..: 1 157 147 86 20 20 93 134 86 90 ...
 $ : Factor w/ 207 levels "0","0.000126830713374967",..: 1 140 161 81 25 22 89 118 89 89 ...
 $ : chr [1:21761] "0" "A1BG" "A1BG-AS1" "A1CF" ...
 $ : Factor w/ 209 levels "0","0.000109570493049298",..: 1 145 181 74 22 27 84 126 93 94 ...
 $ : Factor w/ 216 levels "0","0.000109570493049298",..: 1 157 147 86 20 20 93 134 86 90 ...
 $ : Factor w/ 207 levels "0","0.000126830713374967",..: 1 140 161 81 25 22 89 118 89 89 ...
 $ : chr [1:21761] "0" "A1BG" "A1BG-AS1" "A1CF" ...
 $ : Factor w/ 209 levels "0","0.000109570493049298",..: 1 145 181 74 22 27 84 126 93 94 ...
 $ : Factor w/ 216 levels "0","0.000109570493049298",..: 1 157 147 86 20 20 93 134 86 90 ...
 $ : Factor w/ 207 levels "0","0.000126830713374967",..: 1 140 161 81 25 22 89 118 89 89 ...
 $ : chr [1:21761] "0" "A1BG" "A1BG-AS1" "A1CF" ...
 $ : Factor w/ 209 levels "0","0.000109570493049298",..: 1 145 181 74 22 27 84 126 93 94 ...
 $ : Factor w/ 216 levels "0","0.000109570493049298",..: 1 157 147 86 20 20 93 134 86 90 ...
 $ : Factor w/ 207 levels "0","0.000126830713374967",..: 1 140 161 81 25 22 89 118 89 89 ...
 - attr(*, "dim")= int [1:2] 4 4
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:4] "SYMBOL" "PVALUE1" "PVALUE2" "PVALUE3"
  ..$ : chr [1:4] "SYMBOL" "PVALUE1" "PVALUE2" "PVALUE3"

Plz let me know what i am missing??

Community
  • 1
  • 1
AwaitedOne
  • 992
  • 3
  • 19
  • 42

1 Answers1

1

You can try

indx <- sapply(df, is.factor)
df[indx] <- lapply(df[indx], function(x) as.numeric(as.character(x)))

Or a faster option would be

df[indx] <- lapply(df[indx], function(x) as.numeric(levels(x))[x])
akrun
  • 874,273
  • 37
  • 540
  • 662