4

I have a several data frames which start with a bit of text. Sometimes the information I need starts at row 11 and sometimes it starts at row 16 for instance. It changes. All the data frames have in common that the usefull information starts after a row with the title "location".

I'd like to make a loop to delete all the rows in the data frame above the useful information (including the row with "location").

Ashoka
  • 139
  • 1
  • 6
  • 4
    Welcome to StackOverflow! Please read the info about how to [ask a question](http://stackoverflow.com/help/how-to-ask) and how to produce a [minimal reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). It’s always good to at least post some sample data (and maybe give an example of what you think the output should be). Also share any code that you’ve tried so far. This will make it much easier for others to help you. – Jaap May 25 '14 at 17:16

1 Answers1

2

I'm guessing that you want something like this:

readfun <- function(fn,n=-1,target="location",...) {
   r <- readLines(fn,n=n)
   locline <- grep(target,r)[1]
   read.table(fn,skip=locline,...)
}

This is fairly inefficient because it reads the data file twice (once as raw character strings and once as a data frame), but it should work reasonably well if your files are not too big. (@MrFlick points out in the comments that if you have a reasonable upper bound on how far into the file your target will occur, you can set n so that you don't have to read the whole file just to search for the target.)

I don't know any other details of your files, but it might be safer to use "^location" to identify a line that begins with that string, or some other more specific target ...

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    It may be inefficient but it's a lot better than reading it in as a data.frame first and then removing because that would mess up the detection of the column classes. If you know location always occurs before a certain line (say 20), then you can set `readlines(fn, n=20)` to avoid reading the whole file. – MrFlick May 25 '14 at 18:36