=========================================================================== updates 2/20/2021:
I just look into the problem and found the problem is in the second file, Sex is originally coded as "F" and "M". When I change it with:
subject.info[subject.info$Sex=='F',]$Sex=1
subject.info[subject.info$Sex=='M',]$Sex=2
the weird thing is R directly changed 1 to "1". And what even more weird is it looks like numeric values when you print it.
My question is why this happens, not how to convert the type of values in a data.frame. I don't understand why someone insists it is a duplicated question, even though similar answers can solve the problem.
================================================================================= I have two text files. One file is .txt and the other is .csv. The .csv file has one additional column (with NA values). All the others are the same. When I read those files with the commands:
subject.info = read.table(paste(data_dir, "outd01_all_subject_info.txt", sep = slash), header=TRUE)
subject.info = read.csv("data_d01_features/outd01_all_subject_info2.txt", sep = ',', header=TRUE, stringsAsFactors = F)
The dataframe subject.info looks the same, but when I run:
as.matrix(subject.info)
All the data in the second file are converted to strings:
SUBJID Sex age trauma_age ptsd
[1,] "600039015048" "2" "11" NA "0"
[2,] "600110937794" "1" "10" NA "0"
[3,] "600129552715" "1" "11" " 8" "2"
[4,] "600210241146" "1" "18" "16" "2"
[5,] "600294620965" "1" "13" NA "0"
[6,] "600409285352" "2" "16" "15" "1"
[7,] "600460215379" "1" "10" NA "0"
[8,] "600547831711" "1" "10" " 6" "1"
[9,] "600561317124" "2" "19" "19" "1"
[10,] "600635899969" "2" "11" NA "0"
[11,] "600647003585" "1" "18" NA "0"
[12,] "600682103788" "1" "18" "15" "2"
[13,] "600689706588" "1" "16" "15" "2"
[14,] "600747749665" "2" " 9" " 7" "1"
This does not happen for the first file:
SUBJID Sex age ptsd
[1,] 600039015048 2 10 0
[2,] 600110937794 1 9 0
[3,] 600129552715 1 10 2
[4,] 600210241146 1 17 2
[5,] 600294620965 1 13 0
[6,] 600409285352 2 15 1
[7,] 600460215379 1 8 0
[8,] 600547831711 1 8 1
[9,] 600561317124 2 19 1
[10,] 600635899969 2 11 0
[11,] 600647003585 1 19 0
[12,] 600682103788 1 18 2
[13,] 600689706588 1 15 2
[14,] 600747749665 2 8 1
Is this due to the NA values? But when I replace NAs with 0 in the second file, the problem still exists:
SUBJID Sex age trauma_age ptsd
[1,] "600039015048" "2" "11" " 0" "0"
[2,] "600110937794" "1" "10" " 0" "0"
[3,] "600129552715" "1" "11" " 8" "2"
[4,] "600210241146" "1" "18" "16" "2"
[5,] "600294620965" "1" "13" " 0" "0"
[6,] "600409285352" "2" "16" "15" "1"
[7,] "600460215379" "1" "10" " 0" "0"
[8,] "600547831711" "1" "10" " 6" "1"
[9,] "600561317124" "2" "19" "19" "1"
[10,] "600635899969" "2" "11" " 0" "0"
[11,] "600647003585" "1" "18" " 0" "0"
[12,] "600682103788" "1" "18" "15" "2"
[13,] "600689706588" "1" "16" "15" "2"
[14,] "600747749665" "2" " 9" " 7" "1"
And this problem still exists if I convert the second file to .csv file, nor if I use read.table, or read.csv2