1

I'm trying to extract the first column from this file. It is a sequence of 16 numbers that should be treated as string. The problem is when I write the data to a text file certain values seem to have changed. Following is the code I'm using.

dataMaster = read.table("Master.txt", header = F, colClasses = rep("character",67))

write.table(dataMaster$V1, "sequence.txt", col.names = F, row.names = F, 
            quote = F, sep = "\n")

Below is an example. I'm taking the same two rows 261182, 261183. There are quite a few occurrences of the same error. It seems like when I write the file number 9 gets replaced with number 0.

enter image description here

The master file was processed in a Mac environment and I'm working in a Windows environment.

zx8754
  • 52,746
  • 12
  • 114
  • 209
SriniShine
  • 1,089
  • 5
  • 26
  • 46

2 Answers2

3

Another approach, if you want to sequence to be a character, is to specify your colClasses:

library(dplyr)

dataMaster %>% 
  select(1) %>% 
  write.table("sequence.txt", col.names = F, row.names = F, 
              quote = F, sep = "\n")

sequence <- read.table("sequence.txt", colClasses = "character")

dataMaster[c(261182, 261183), 1]
#[1] "9171513174761179" "9171513174771179"

sequence[c(261182, 261183), ]
#[1] "9171513174761179" "9171513174771179"
patL
  • 2,259
  • 1
  • 17
  • 38
0

If you want to operate with character strings, please see patL's answer below. What follows was my answer for treating the sequence of numbers as numeric.


You may need to increase the scipen value under options:

options(scipen=999)

This increases the amount of precision digits beyond the default 15 when you output the numbers to any format.

In terms of input, if you were specifying character only to avoid losses, note that you should instead change the numerals argument of read.table to "no.loss" as the default (first option) will lead to loss without warning. I've copied the relevant text from the help page ?read.table below.

read.table(..., numerals = c("allow.loss", "warn.loss", "no.loss"))
        string indicating how to convert numbers whose conversion to
        double precision would lose accuracy, see type.convert. Can be
        abbreviated. (Applies also to complex-number inputs.)
Fons MA
  • 1,142
  • 1
  • 12
  • 21
  • @FonsMA I've tried with options(scipen=999) and it worked. Thank you. – SriniShine Feb 13 '19 at 09:41
  • 1
    This is a bad answer. If the data is character, then it should be read and written as character, rather than fiddling with significant figures. – Hong Ooi Feb 13 '19 at 09:59
  • 1
    I stand corrected, I missed the "should be treated as a string" several times... I can't delete a "correct" answer, so I've edited to reflect your point and indicate patL's answer below – Fons MA Feb 13 '19 at 10:12
  • 1
    No worries. Props for being able to accept criticism; a lot of people would fly off the handle.... – Hong Ooi Feb 13 '19 at 10:18
  • @FonsMA just to be clear for the future references I will mark the patL's answer as the correct answer. – SriniShine Feb 13 '19 at 11:04