0

The fist 5 rows of a large file (1000 000 rows in total) are as follows:

c6 c24 c32 c54 c67
c6 c24 c32 c51 c68 c78
c6 c32 c54 c67
c6 c32 c55 c63 c85 c94 c75
c6 c32 c53 c67

readLines() can read a row at a time from the 1st rows.When I want to read the 20001th row,readLines() is not so efficient.Are there R functions can be used to read and delete a specific row from a large file. Thank you.

user2405694
  • 847
  • 2
  • 8
  • 19
  • what about reading line by line in loop and writing to file online. In this case you will be able to delete all rows that you don't need. – storaged Jun 21 '13 at 12:27
  • 2
    Is my understanding correct, that you want to delete a line without reading/writing the whole file? I believe there are more appropriate tools for this than R. – Roland Jun 21 '13 at 13:11
  • Why did you want to use `R` for a file-editing function? If that's all you're doing, far better to use some shell commands or a text editor of your choice. – Carl Witthoft Jun 21 '13 at 14:53
  • The best answer is http://stackoverflow.com/questions/1874443/import-data-into-r-with-an-unknown-number-of-columns – user2405694 Jun 23 '13 at 01:37

2 Answers2

1

How about using scan which has both a skip and an nlines argument if you just want to read the file.

scan( "myfile" , skip = 20000 , nlines = 1 )

I am not sure about deleting however. Usually with R, everything is possible, but I think you have to read the whole file in before you can delete the line if you want to have a complete copy of the original file, sans the specific line you are referring to.

Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
0

Two qualifications: 1 this is years after the question was asked; 2 this only works for replacing the last line. Despite point 2, I think it could be modified to correct for the specific line modification other than the last line.

Rather than using read.table and write.table, which take time with large arrays, readLines and writeLines appears to be more efficient. In the following example I remove the last line of a large array and replace it with new text.

Set up the example by creating a large array and saving as a file:

write.table(
array(runif(1000000),dim=c(1000,1000)),
file="BigArray.r", row.names = FALSE, col.names = FALSE, sep = "\t")

Open the big array file using readLines, remove the last line and then write it again. Separately, use writeLines to add a new final line:

time=proc.time()
BigArray=readLines("BigArray.r")
BigArray=BigArray[-length(BigArray)]
writeLines(BigArray,"BigArray.r",sep="\n")
write(seq(1,1000,1),ncolumns=1000,file="BigArray.r",append=TRUE,sep="\t")
proc.time()-time

user  system elapsed 
0.69    0.10    0.85 

This performs better than the alternative:

time=proc.time()
BigArray=read.table("BigArray.r", sep = "\t")
BigArray[1000,]=seq(1,1000,1)
write.table(BigArray,file="BigArray.r", row.names = FALSE, col.names = FALSE, 
sep = 
"\t")
proc.time()-time

user  system elapsed 
3.62    0.11    3.75

Somebody may be able to do a better job of replacing a specific line within the middle of the array, but I can't get the new line for insertion into the same text format that readLines converts into.

JamesF
  • 101
  • 1
  • 3