1

I would like to convert multiple columns in a dataframe from factor to numeric.

Here is an example column:

unique(f$HUVEC_fitCons_score)
  [1] 0.635551 0.714379 0.638787 0.562822 0.542086 0.620976 0.56751  0.554799 0.592323 0.63947  0.627883
 [12] 0.665054 0.645665 0.492483 0.491896 0.636168 0.711    0.604944 0.613276 0.56214  0.727631 0.567892
 [23] 0.699875 0.635259 0.655142 0.733575 0.645948 0.683762 0.372554 0.249971 0.616125 0.631631 0.564101
 [34] 0.765457 0.633917 0.463824 0.664235 0.530356 0.6365   0.581474 <NA>     0.620846 0.528226 0.735409
 [45] 0.691587 0.586402 0.7233   0.651492 0.825845 0.058706 0.584449 0.572988 0.618803 0.526803 0.699908
 [56] 0.478617 0.683672 0.505526 0.741806 0.567339 0.657601 0.683535 0.581314 0.603991 0.648885 0.591603
 [67] 0.604282 0.526665 0.621717 0.830532 0.579976 0.0      0.638833 0.599892 0.6691   0.677812 0.677038
 [78] 0.756233 0.466023 0.607083 0.508809 0.322989 0.349732 0.620204 0.662026 0.678554 0.616919 0.668105
 [89] 0.755437 0.503917 0.273489 0.704051 0.525792 0.687789 0.3752   0.673998 0.421255 0.756605 0.665031
[100] 0.59522  0.447301 0.622129 0.548927 0.563494 0.550183 0.656636 0.296957 0.71     0.663205 0.836244
[111] 0.605231 0.055017 0.297325 0.574951 0.444512 0.662433 0.654926 0.757729 0.629945 0.75052  0.674467
[122] 0.76194  0.536845 0.113707 0.192219 0.52698  4.41E-4  0.057018 0.003489 0.24341  0.223356 0.166187
[133] 0.767244 0.549297 0.404113 0.062806 0.692231 0.600526 0.670209 0.264475 0.152031 0.721622 0.159066
[144] 0.375513 0.695668 0.221012 0.615788 0.413926 0.631177 0.759125 0.74596  0.650387 0.241949 0.553451
[155] 0.655445 0.341033 0.092715 0.600164 0.602482 0.075334 0.553173 0.48735  0.566403 0.421638 0.054758
[166] 0.221052 0.675733 0.128931 0.272564 0.345915 0.088406 0.078987 0.147424 0.186031 0.68491  0.342787
[177] 0.634344 0.00508  0.236223 0.016238 0.160608 0.649648 0.330827 0.725737 0.175304 0.726065 0.765956
[188] 0.666353 0.357269 0.083433 0.22236 
192 Levels: - 0.0 0.003489 0.00508 0.016238 0.054758 0.055017 0.057018 0.058706 0.062806 ... 4.41E-4

and when I try as.numeric(f$HUVEC_fitCons_score)

I get

as.numeric(f$HUVEC_fitCons_score)
   [1] 123 123 169 126  80  80  72  80 113  80  80 169  85  78  95 128 116 145 129 126  61  61  60 124 169
  [26] 169 169 169 168 168 168 103 103 106  80 106  80  80  72 106  79 174 129 106 106  80  80  80  80 106
  [51]  86 103 103 174 129 164 106 122 174 174 136 103 169 169  72  72  72  72 169 169 175 169 169 169 106
  [76] 136 136  86 136 136 136 136 123 123 123  86  86 130 164 136 136 136 136 123 123 136 136 123 164 158
 [101] 136 136 136  86  86  86 136  85 128 123  47  33  47  72  72  72 124 169 169  86  86  86  72 108 108
 [126] 169 169 169 169 169 169 168 168 169 168 106 106 106 106 106  72 106 119 123 169 174 169 169 174 174
 [151] 174 174 169 168 168 169 169 169 169 169 169 169 123 175 169 169 169 168 168 168 168 168 168 168 168
 [176] 124 168 168 168 106 106 106 106 106 106 169 169 169 169 169  79 169  82  82  80 106 106  86 186 164
 [201] 124  72 169 106 106 120 106 106 106 106 106  56  95 143 124  70  70 128 169 169 169 169 169 169 169
 [226] 169 169 169 169 169 169 169 169 169 169 169 169 169 169 123  70  70  70 125 106 169 169 169 169 169
 [251] 169 123 169 123 174 123  91  82  82  82  82  82  82  82  82  NA  NA  82  82 112  82  82  82  82  82
 [276]  82  82  82 123 123  80  72 123 106 106 106 106 106 106 106  86  82  82  82  82 112 112 112 112  82
 [301]  82  82  82  82  82  82  82 106 106 106 106 123  69 158 158 158 136 136 136 136  86  86 169 176 168
 [326] 169 169 169 169  82  72  72  70  70  80 106 106 106  80 106 106 106 106  78 106  80  80  80 106 106
 [351] 106 161 108  72  93  93 168 168 168 168 176 171 134  72 169 129  72  72 169 169 169 169 169 169 169
 [376] 169 158 169 169 169 169 169 169  79  72  72  70  82  82  72  72  72  72  72 189 106 106 106   9  82
 [401]  92  82 106  82  82  82  82  82  82  82 106 106 106 106  72  72 106  87  87 158  61 169 169 169 175
 [426] 175 169 106  86  60 169 123 169 169 169 169 123 158 158  72  72 123 123  86  72  72  92  92  92  92
 [451] 106 106 110  91  47  47 169 169 169 169 169 169 169 169 161 161 169 169 169 169 169 169 169 169 169
 [476] 169 169 169 169 169 169 169 169 169 168 168 168 168 168 168 169 169  67 119  67 119  86 106 168 168
 [501] 165 136  86  86  70  70  58  58  58  72  72  72  72  72  72  72  72  72  72  72  72 119 119 119 119
 [526] 119  72  72 128 128 128 157 157 157 128 128  70 108 108  72  63 168 176 176 168 168 168 177 177 177
 [551] 168 168 171 168 168 168 168 168 176  92 106 106 106  84 169 169 169 169 128 106  70  72  72 169 169
 [576] 168 169 169 169 168 177 168 168 168 169 136 158 139 169 125 169 168 168 168 168 168 177 168 168 168
 [601] 168 176 169 161 169 174 164 156  90 101 101  82  82  72  72 125  72  72  72  72  86 106 106 106 106
 [626] 106 106  70  70  70  70  70  72  72 112 131 168  79  72 106 112 112  82  82  82  82  82 112 112 106
 [651] 169 123  94  94 169 123 169  72 106  70  70  70  70  70  72  72  70 102  79 106  70  70  66 123 123
 [676] 169 136  72  72  66  72  72  72 169 169  61 168 123 123 123 168 169 123 106 106 106 106 106  72  72
 [701] 114 158 158 158 169 169 123 136 123 190 119 123  86 106 169 123 176 128 169 168 168 169 169 169 169
 [726] 176 176 169  93  89  86  86 136 169  80  80  92  86 126  84 129 143  92  92 106 106 106 128 123 124
 [751] 124 131  70  80 106 169 169  89 136  86  60  60  82  82  82  82  80  80  80  72  72  72 113 113 113
 [776] 158 158   2 165 174 168 169 106 169 123 158 169  80 106 168 127  97 176 168 169 148 106 106 106 106
 [801] 106  82 123 123 106 106 106 106  72  72  72  70 116  86 168 128 106 128  86 169 123 123 123 136 136
 [826] 130 174  72 106 106  72 169 123 123  86 123 123 103 168 143  72 106  86  86  86  86  86 136 112  72
 [851]  72 169 154  72 106 106 177 169 169 169 169 123  60 123 123 123 123 123 158 158 169 169 169 169 169
 [876] 125  82 168 168 168  70  70  72  72  72  93  70  70  33  33 176 168 169 169 169 169 128  72  72  72
 [901]  72 169  91  91 126 176 176 176 143 119 153 106  86 106 169 169 169  79 126 126 126 126 176 171 181
 [926] 181 181 181 181 176 176 168 168  57  57 105 169 169  47 124 124  64  64 124 124 124 168  39  39  44
 [951]  94 176 169 136 136 169 168 168 177 177 168 168 168 168 123 169 169 174 111  69 169 169 169 169 169
 [976] 169 169 169 169 139 169 123 123 123 124 124 168 169 169 169 169 123  86 123 123 168 168 140  82  82
 [ reached getOption("max.print") -- omitted 30773 entries ]

which is clearly not my desired output. is the <NA> and 4.41E-4 messing this up? I just want to change such a column from factor to numeric because later on I am using randomForest() and only a certain amount of factor levels are allowed for a given feature.

Thanks.

brucezepplin
  • 9,202
  • 26
  • 76
  • 129
  • how come this column isn't already `numeric`? you should probably fix that instead of convert to numeric afterwards. If it is not possible then see: https://stackoverflow.com/q/3418128/4137985 – Cath Oct 23 '17 at 09:51

2 Answers2

3

you should use

as.numeric(as.character(f$HUVEC_fitCons_score))

Just using as.numeric() converts the underlying factor values (1 for the first level, 2 for the second one...) into numeric values.

However, as far as I know, NA values are not allowed in randomForest() by default. You would have to adjust the argument na.action. (See ?randomForest)

Instead, you coula also remove all rows holding NA values from the data set:

library(dplyr)
f <- f %>% na.omit()
loki
  • 9,816
  • 7
  • 56
  • 82
0

I'm not sure that NA values make sense as part of your response variable. If the values are really unknown, those data points either need to be removed from the model or replaced with something else. One option at your fingertips would be to convert to numeric and then replace all NA values with the median:

response <- as.numeric(levels(f$HUVEC_fitCons_score))[f$HUVEC_fitCons_score]
response[is.na(response)] <- median(response, na.rm=TRUE)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360