why data.frame in R change the datatype unexpectedly?

Question

=========================================================================== updates 2/20/2021:

I just look into the problem and found the problem is in the second file, Sex is originally coded as "F" and "M". When I change it with:

subject.info[subject.info$Sex=='F',]$Sex=1
subject.info[subject.info$Sex=='M',]$Sex=2

the weird thing is R directly changed 1 to "1". And what even more weird is it looks like numeric values when you print it.

My question is why this happens, not how to convert the type of values in a data.frame. I don't understand why someone insists it is a duplicated question, even though similar answers can solve the problem.

================================================================================= I have two text files. One file is .txt and the other is .csv. The .csv file has one additional column (with NA values). All the others are the same. When I read those files with the commands:

subject.info = read.table(paste(data_dir, "outd01_all_subject_info.txt", sep = slash), header=TRUE)

subject.info = read.csv("data_d01_features/outd01_all_subject_info2.txt", sep = ',', header=TRUE, stringsAsFactors = F)

The dataframe subject.info looks the same, but when I run:

as.matrix(subject.info)

All the data in the second file are converted to strings:

     SUBJID         Sex age  trauma_age ptsd
  [1,] "600039015048" "2" "11" NA         "0" 
  [2,] "600110937794" "1" "10" NA         "0" 
  [3,] "600129552715" "1" "11" " 8"       "2" 
  [4,] "600210241146" "1" "18" "16"       "2" 
  [5,] "600294620965" "1" "13" NA         "0" 
  [6,] "600409285352" "2" "16" "15"       "1" 
  [7,] "600460215379" "1" "10" NA         "0" 
  [8,] "600547831711" "1" "10" " 6"       "1" 
  [9,] "600561317124" "2" "19" "19"       "1" 
 [10,] "600635899969" "2" "11" NA         "0" 
 [11,] "600647003585" "1" "18" NA         "0" 
 [12,] "600682103788" "1" "18" "15"       "2" 
 [13,] "600689706588" "1" "16" "15"       "2" 
 [14,] "600747749665" "2" " 9" " 7"       "1"

This does not happen for the first file:

       SUBJID Sex age ptsd
  [1,] 600039015048   2  10    0
  [2,] 600110937794   1   9    0
  [3,] 600129552715   1  10    2
  [4,] 600210241146   1  17    2
  [5,] 600294620965   1  13    0
  [6,] 600409285352   2  15    1
  [7,] 600460215379   1   8    0
  [8,] 600547831711   1   8    1
  [9,] 600561317124   2  19    1
 [10,] 600635899969   2  11    0
 [11,] 600647003585   1  19    0
 [12,] 600682103788   1  18    2
 [13,] 600689706588   1  15    2
 [14,] 600747749665   2   8    1

Is this due to the NA values? But when I replace NAs with 0 in the second file, the problem still exists:

       SUBJID         Sex age  trauma_age ptsd
  [1,] "600039015048" "2" "11" " 0"       "0" 
  [2,] "600110937794" "1" "10" " 0"       "0" 
  [3,] "600129552715" "1" "11" " 8"       "2" 
  [4,] "600210241146" "1" "18" "16"       "2" 
  [5,] "600294620965" "1" "13" " 0"       "0" 
  [6,] "600409285352" "2" "16" "15"       "1" 
  [7,] "600460215379" "1" "10" " 0"       "0" 
  [8,] "600547831711" "1" "10" " 6"       "1" 
  [9,] "600561317124" "2" "19" "19"       "1" 
 [10,] "600635899969" "2" "11" " 0"       "0" 
 [11,] "600647003585" "1" "18" " 0"       "0" 
 [12,] "600682103788" "1" "18" "15"       "2" 
 [13,] "600689706588" "1" "16" "15"       "2" 
 [14,] "600747749665" "2" " 9" " 7"       "1"

And this problem still exists if I convert the second file to .csv file, nor if I use read.table, or read.csv2

The second issue is different from the first one and in which case you should ask that as a new question instead of updating the original one after 18 days. The reason why `subject.info[subject.info$Sex=='F',]$Sex=1` is changed to "1" is because a column can have only data of only 1 type. When you are changing "F" to 1, in the same column you still have data which is character i.e "M" because of this character it changes 1 to "1" since you cannot have data of mixed type in the column. — Ronak Shah, Feb 21 '21 at 00:14

score 3 · Answer 1 · answered Feb 02 '21 at 01:52

From the output it look that column trauma_age is of class character which is turning everything into character. Check class(subject.info$trauma_age).

Turn it into numeric by doing :

subject.info$trauma_age <- as.numeric(subject.info$trauma_age)

and then try converting to matrix i.e as.matrix(subject.info).

You can also use type.convert to convert data automatically to respective types without worrying about column names.

subject.info <- type.convert(subject.info, as.is = TRUE)

subject.info <- type.convert(subject.info, as.is = TRUE) This works perfectly! Thanks a lot! — Xin Niu, Feb 02 '21 at 01:56

why data.frame in R change the datatype unexpectedly?

1 Answers1