0

I have a large plain text file to be read in R, where all data is contained at the same line with no spaces (DNA sequence with no header). I found the next function:

readChar("filename",nchar=n)

which allows to read just the "n" first elements of the file saving a lot of time. Is there another function in R that goes further by reading just from START position to STOP one, avoiding to upload the whole file?

Tomás Navarro
  • 160
  • 1
  • 1
  • 7

2 Answers2

1

Basically no, from what i know, you need to read the whole file and then discard the characters that you don't want. For example, if you want only the first 10 letters for every line:

strsub(readChar("filename",nchar=n),1,10)

But, this post (How to efficiently read the first character from each line of a text file?) shows some ways of improving the efficiency of that.

  • 1
    Thank you Ricardo, I did not find this post, It was what I was looking for but, unfortunately, It seems not be possible reading a file from a no start position. Anyway, readChar instead of scan, improves the execution time a lot. On the other hand, I do not find any differences between stri_sub from stringi and substring from base for large files reading. Thanks again! – Tomás Navarro Oct 22 '20 at 10:01
1

You have to create a connexion, then use the seek function. Do not forget to close the connexion after.

For example, this will read 100 characters from position 1000.

cx <- file("filename", "rb")
seek(cx, 1000)
d <- readChar(cx, nchar=100)
close(cx)
Sci Prog
  • 2,651
  • 1
  • 10
  • 18