0

i have files with similar contents

!software version: $Revision$
!date: 07/06/2016 $ 
!
! from Mouse Genome Database (MGD) & Gene Expression Database (GXD)
!
MGI

I am using read.csv to read the files. But I need to skip the lines with "!" in the beginning. How can I do that?

user1631306
  • 4,350
  • 8
  • 39
  • 74
  • 1
    You can set the `skip` parameter, but it just takes an integer of lines to skip. You could calculate that number with `readLines` and `grep` if you're doing it programmatically or a lot, but otherwise it's probably easiest to just look at the file. – alistaire Jul 26 '16 at 17:07
  • I have multiple files, and its not consistent in all the files. Sometimes its 12, sometimes its 45. thats why I cant use skip – user1631306 Jul 26 '16 at 17:11
  • You could use one of the answers from [this question](http://stackoverflow.com/questions/27747426/how-to-efficiently-read-the-first-character-from-each-line-of-a-text-file) to determine which lines start with `!`, then use `read.csv`'s `skip` argument based on that. – Rich Scriven Jul 26 '16 at 17:40

2 Answers2

2

The read.csv function and read.table that it is based on have an argument called comment.char which can be used to specify a character that if seen will ignore the rest of that line. Setting that to "!" may be enough to do what you want.

If you really need a regular expression, then the best approach is to read the file using readLines (or similar function), then apply the regular expression to the resulting vector of character strings to drop to unwanted elements (rows), then pass the result to the text argument to read.table (or use a text connection).

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
0

To calculate the first line that doesn't start with a !,

to_skip <- min(grep('^[^!]', trimws(readLines('file.csv'))))

df <- read.csv('file.csv', skip = to_skip)
alistaire
  • 42,459
  • 4
  • 77
  • 117