6

I need to convert my data frame into a numeric matrix. However, when I use the data.frame function, the decimals get converted to a different number and I have no idea why. Can someone fill me in on what's happening?

> head(x[,1:5])
         TCGA-AA-3520-01A-01R-0821-07 TCGA-AA-3532-01A-01R-0821-07 TCGA-AA-3553-01A-01R-0821-07 TCGA-A6-2674-01A-02R-0821-07 TCGA-AA-3521-01A-01R-0821-07
ELMO2              -0.840833333333333                        0.018            0.354916666666667                    -0.203750                    0.6890000
CREB3L1                         1.333                       0.7625                      0.13475                     2.498750                    1.1572500
RPS11                          1.4755                       0.3245                        0.634                     0.483125                    0.9526250
PNMA1                        -1.39075                     -1.48725                      -0.8305                    -0.463250                   -2.2230000
MMP2               0.0278333333333333                      -0.2065           0.0666666666666666                     2.156000                    0.1501667
C10orf90                      -2.5495                     -2.76575                     -2.76375                    -2.482250                   -2.1107500
> head(data.matrix(x[,1:5]))
         TCGA-AA-3520-01A-01R-0821-07 TCGA-AA-3532-01A-01R-0821-07 TCGA-AA-3553-01A-01R-0821-07 TCGA-A6-2674-01A-02R-0821-07 TCGA-AA-3521-01A-01R-0821-07
ELMO2                            3323                           94                         1701                    -0.203750                    0.6890000
CREB3L1                          4307                         3022                          654                     2.498750                    1.1572500
RPS11                            4485                         1458                         2786                     0.483125                    0.9526250
PNMA1                            4379                         4438                         3397                    -0.463250                   -2.2230000
MMP2                              155                          932                          328                     2.156000                    0.1501667
C10orf90                         5139                         5193                         5230                    -2.482250                   -2.1107500
> class(x)
[1] "data.frame"

> str(x)
'data.frame':   6150 obs. of  174 variables:
 $ TCGA-AA-3520-01A-01R-0821-07: Factor w/ 5538 levels "","0","0.000166666666666662",..: 3323 4307 4485 4379 155 5139 4177 1400 4735 3363 ...
 $ TCGA-AA-3532-01A-01R-0821-07: Factor w/ 5597 levels "","0.000499999999999968",..: 94 3022 1458 4438 932 5193 1374 2757 4671 2503 ...
 $ TCGA-AA-3553-01A-01R-0821-07: Factor w/ 5550 levels "","0.000249999999999995",..: 1701 654 2786 3397 328 5230 65 194 4900 3966 ...
 $ TCGA-A6-2674-01A-02R-0821-07: num  -0.204 2.499 0.483 -0.463 2.156 ...
 $ TCGA-AA-3521-01A-01R-0821-07: num  0.689 1.157 0.953 -2.223 0.15 ...
 $ TCGA-AA-3534-01A-01R-0821-07: num  -0.6789 -0.0877 1.5736 -1.6678 -0.7148 ...
 $ TCGA-AA-3555-01A-01R-0821-07: Factor w/ 5580 levels "","-0.00012499999999999",..: 373 4970 2076 519 1344 5084 3882 1285 4760 2778 ...
 $ TCGA-A6-2670-01A-02R-0821-07: num  0.588 0.569 0.808 -1.661 1.073 ...
 $ TCGA-A6-2683-01A-01R-0821-07: num  -0.77 0.741 1.564 -2.984 -1.569 ...
 $ TCGA-AA-3526-01A-02R-0821-07: num  -0.824 2.215 0.819 -1.846 -0.862 ...
 $ TCGA-A6-2677-01A-01R-0821-07: num  -0.733 0.526 0.892 -1.598 -1.69 ...
 $ TCGA-AA-3522-01A-01R-0821-07: num  -0.981 2.094 0.818 -1.048 -1.452 ...
 $ TCGA-AA-3538-01A-01R-0821-07: num  -0.144 0.631 0.794 -1.523 -0.198 ...
 $ TCGA-AA-3556-01A-01R-0821-07: Factor w/ 5556 levels "","-0.000125000000000014",..: 2256 4772 3446 4253 4040 4927 3026 316 3766 3221 ...
 $ TCGA-A6-2678-01A-01R-0821-07: num  -1.38 1.706 1.103 -2.725 -0.918 ...
 $ TCGA-AA-3524-01A-02R-0821-07: Factor w/ 5611 levels "","-0.0005","0.000500000000000006",..: 4062 3671 4749 4751 4051 5226 2623 1227 4252 1489 ...
 $ TCGA-AA-3542-01A-02R-0821-07: num  -1.195 0.641 1.952 -1.63 -1.264 ...
 $ TCGA-AA-3558-01A-01R-0821-07: Factor w/ 5580 levels "","0.000375000000000007",..: 4245 3920 4277 4910 4766 5126 1450 3350 4898 1915 ...
 $ TCGA-AA-3544-01A-01R-0821-07: num  -0.157 0.649 0.937 -1.941 -1.417 ...
 $ TCGA-AA-3560-01A-01R-0821-07: num  -0.146 0.554 0.581 -2.503 -0.438 ...
 $ TCGA-AA-3514-01A-02R-0821-07: Factor w/ 5678 levels "","0","0.000375000000000028",..: 3800 2056 2422 1158 1507 4620 3564 1877 5480 4076 ...
 $ TCGA-AA-3527-01A-01R-0821-07: num  -0.3973 -0.0915 1.4019 -2.5513 -0.395 ...
 $ TCGA-AA-3548-01A-01R-0821-07: Factor w/ 5470 levels "","0.000100000000000011",..: 2590 3817 3388 4531 2770 4922 2715 406 4473 2711 ...
 $ TCGA-AA-3561-01A-01R-0821-07: num  -1.115 1.01 1.266 -1.419 -0.537 ...
 $ TCGA-AA-3517-01A-01R-0821-07: Factor w/ 5604 levels "","-0.000333333333333335",..: 479 1182 4514 5003 4005 4799 1499 4796 849 3079 ...
 $ TCGA-AA-3529-01A-02R-0821-07: Factor w/ 5583 levels "","-0.000124999999999978",..: 2912 3970 4073 4555 4257 5238 3242 2668 899 3508 ...
 $ TCGA-AA-3549-01A-02R-0821-07: Factor w/ 5538 levels "","0.000166666666666671",..: 1378 4762 4356 4857 519 4739 1254 4777 350 444 ...
 $ TCGA-AA-3562-01A-02R-0821-07: Factor w/ 5628 levels "","0","0.000249999999999993",..: 2453 3556 3523 4987 2236 5148 1681 1854 2249 4096 ...
Jay
  • 741
  • 10
  • 26
  • Perhaps some columns are factors, not numeric. What does `str(x)` say? I assume `data.matrix` means `as.matrix`? – lukeA Mar 04 '15 at 21:25
  • 1
    @lukeA: `data.matrix` creates a numeric matrix from a data frame. – Alex A. Mar 04 '15 at 21:26
  • @lukeA I added the str(x) output to the original post. Have any idea why certain columns are being read in as different class? `data.matrix` converts data.frame into a numeric matrix. – Jay Mar 04 '15 at 21:28
  • 1
    This looks like one of the commonest R problems. The imported data has some columns with missing values coded by a string other than NA. SO it has imported the whole column as 'factor' not numeric. When you coerce ti to matrix the numeric levels of the factor rather than the labels are then shown. The solution is to look at your data file (or try summary on those columns) identify the missing value string and set that as such in the `read.delim` or `read.table` options. Good luck. – Stephen Henderson Mar 04 '15 at 21:35

1 Answers1

2

The data.matrix() function converts factors to numbers by using their internal codes. That's why they're listed as factors in the data frame and have different values after using data.matrix(). To create a numeric matrix in this situation, try this:

y <- apply(as.matrix(x[, 1:5]), 2, as.numeric)

When using as.matrix(), factors become strings. Using apply() will convert everything to numeric without losing the matrix structure.

As Stephen Henderson mentioned in his comment, it's a good idea to try to figure out why the numeric values stored in your data frame are being treated as factors.

Alex A.
  • 5,466
  • 4
  • 26
  • 56