0

I've a table with some numeric and character cols, some are factors and other integers.

>additional.metadata
      sample_id patient_id condition SOM test
1387          1          1       CTL  22    1
7588          1          1       CTL  35    2
7429          1          1       CTL  23    3
7600          1          1       CTL  35    4

I'm trying to convert the entire table to a matrix and, depending of apply option used (i.e. apply vs sapply), some values from $SOM changes. Here's an example:

> apply(additional.metadata, 2, function(x) as.numeric(as.factor(x)))
     sample_id patient_id condition SOM test
[1,]         1          1         1   2    1
[2,]         1          1         1   4    2
[3,]         1          1         1   3    3
[4,]         1          1         1   4    4
[5,]         1          1         1   1    5
[6,]         1          1         1   3    6
> sapply(additional.metadata, function(x) as.numeric(as.factor(x)))
     sample_id patient_id condition SOM test
[1,]         1          1         1  22    1
[2,]         1          1         1  35    2
[3,]         1          1         1  23    3
[4,]         1          1         1  35    4
[5,]         1          1         1  11    5
[6,]         1          1         1  23    6

Someone knows what I'm missing/misunderstanding, please? Thanks in advance.

jgarces
  • 519
  • 5
  • 17
  • 1
    (This is a common question on SO.) `apply` always converts it first argument to a `matrix`. When there are any `character` columns, **everything** becomes `character`, period. Typically you'd use `apply` on a subset of columns, e.g., `apply(x[,c(1:3,5)], 2, ...)`, in order to only use numeric columns you truly need. If you want help with your frame (as a mix of `numeric`, `factor`, and `character`), then you *must* provide usable data in the form of `dput(head(x))`; console output is ambiguous. – r2evans Feb 04 '20 at 17:25
  • Thanks, I didn't know this matrix conversion of `apply`, very useful. – jgarces Feb 05 '20 at 07:55
  • What is your intended output? – r2evans Feb 05 '20 at 14:58
  • I'd like to transform my _$SOM_ col in numeric but keeping its original numeric values, without re-coding them (that's why I asked). Thanks – jgarces Feb 06 '20 at 09:42
  • jgarces, I don't say it without reason: the use of `dput` in giving us sample data provides ***unambiguous*** data. What you have provided is not clear, because R's console output does not differentiate between `integer`, whole `numeric`, `factor`, and numeric-looking `character`. So we cannot help you unless you provide something that is actually representative of what you have. So again, please provide data by giving us the output from `dput(head(additional.metadata))`. – r2evans Feb 06 '20 at 15:56

1 Answers1

0

Most likely the changes are happening because of as.numeric(as.factor(x)).

To make sure your value stay as intended you need to convert to character or skip the factor stage at all.

Use as.numeric(as.character(as.factor(x))) or as.numeric(as.character(x)) instead.

An explanation of why you need to do this can be foudn on the top answer of this question:

Changing values when converting column type to numeric

Andrew Haynes
  • 2,612
  • 2
  • 20
  • 35
  • Thanks for your answer, but if I use `as.numeric(as.character(x))` because character cols, some values are transformed incorrectly to NAs. – jgarces Feb 05 '20 at 07:54