0

I have a data.tsv file (tabs separate entries). The full file can be found here.

The entries in the file look like this:

">173D:C"   "TVPGVXTVPGV"   "CCSCCCCCCCC"
">173D:D"   "TVPGVXTVPGV"   "CCCCCCCCSCC"
">185D:A"   "SAXVSAXV"  "CCBCCCBC"
">1A0M:B"   "GCCSDPRCNMNNPDYCX" "CCTTSHHHHHTCTTTCC"
">1A0M:A"   "GCCSDPRCNMNNPDYCX" "CGGGSHHHHHHCTTTCC"
">1A0N:A"   "PPRPLPVAPGSSKT"    "CCCCCCCCSTTCCC"

I am trying to read string entries into the data frame (into a matrix containing 3 columns):

data = data.frame(read.csv(file = './data.tsv', header = FALSE, sep = '\t'))

but only the first column is read. All other columns are empty.

I also tried different commands, such as

data = read.csv(file = './data.tsv', header = FALSE, sep = '\t')
data = read.csv(file = './data.tsv', sep = '\t')
data = data.frame(read.csv(file = './data.tsv'))

but without success. Can someone foresee why the input does not get read successfully?

mercury0114
  • 1,341
  • 2
  • 15
  • 29
  • You don't need the data frame, `read.csv` returns that already. Use `read.table` for tab separated data. Check that you really have tabs in your data, not multiple spaces. –  May 28 '18 at 13:06

2 Answers2

2

Using the file defined reproducibly in the Note at the end this works:

DF <- read.table("myfile.dat", as.is = TRUE)

gives:

> DF
       V1                V2                V3
1 >173D:C       TVPGVXTVPGV       CCSCCCCCCCC
2 >173D:D       TVPGVXTVPGV       CCCCCCCCSCC
3 >185D:A          SAXVSAXV          CCBCCCBC
4 >1A0M:B GCCSDPRCNMNNPDYCX CCTTSHHHHHTCTTTCC
5 >1A0M:A GCCSDPRCNMNNPDYCX CGGGSHHHHHHCTTTCC
6 >1A0N:A    PPRPLPVAPGSSKT    CCCCCCCCSTTCCC

Note

Lines <- '">173D:C"   "TVPGVXTVPGV"   "CCSCCCCCCCC"
">173D:D"   "TVPGVXTVPGV"   "CCCCCCCCSCC"
">185D:A"   "SAXVSAXV"  "CCBCCCBC"
">1A0M:B"   "GCCSDPRCNMNNPDYCX" "CCTTSHHHHHTCTTTCC"
">1A0M:A"   "GCCSDPRCNMNNPDYCX" "CGGGSHHHHHHCTTTCC"
">1A0N:A"   "PPRPLPVAPGSSKT"    "CCCCCCCCSTTCCC"'
writeLines(Lines, "myfile.dat")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
1

Use sep=''

data = read.csv(file = './data.tsv', header = FALSE, sep = '')

See this answer.

user387832
  • 503
  • 3
  • 8