3

I've many string files (.str), and I want to import them in R (looping on files). The problem is that the first line is neither columns name nor the beginning of the matrix.. It is a comment line. Idem for the last line. between those two lines, stand up the matrix I want to import.. How can I do that ?

Thx

  • Welcom to SO. Please read this on how to create a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). In this case for example, you should add some parts of your text, then what have you tried. – agstudy Jul 05 '13 at 09:26
  • 1
    Are they preceeded by any kind of comment character? – James Jul 05 '13 at 09:26
  • 6
    Read `?read.table`. The parameters `skip`, `nrow`, and `comment.char` might be relevant to you. – Roland Jul 05 '13 at 09:28
  • If the files don't all have identical structure, you can always read in with `readLines` and then use regexp functions to remove lines you don't want before converting to your intended data structure. – Thomas Jul 05 '13 at 09:41
  • Thx you guys.. Roland, I can't use nrow.. number of rows depends on files, it is variable. –  Jul 05 '13 at 09:41
  • 2
    @user2551551 But if the ***first*** line is the one you want to skip, just use `skip = 1` in `read.table` to jump the first line and carry on as normal, e.g. `read.table( "myfile.txt" , skip = 1 , header = TRUE )` – Simon O'Hanlon Jul 05 '13 at 10:11

3 Answers3

6

If you want to skip the first and last lines in a file, you can do it as follows. Use readLines to read the file into a character vector, and then pass it to read.csv.

strs <- readLines("filename.csv")
dat <- read.csv(text=strs,             # read from an R object rather than a file
                skip=1,                # skip the first line
                nrows=length(strs) - 3 # skip the last line
                )

The - 3 is because the number of rows of data is 3 less than the number of lines of text in the file: 1 skipped line at the beginning, 1 line of column headers, and 1 skipped line at the end. Of course, you could also just ignore the nrows argument, and delete the nonsense row from your data frame after the import.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
6

You can put your comments anywhere in the data files in the same way that you put your comments an R script. For example, if I have a data.txt like this:

# comment 1
str1
str2
# comment 2
str3
# comment 3
str4
str5# comment 4
str6
str7
# comment 5

Then you don't need to do anything to skip the comments:

> x<-read.table("data.txt", header=FALSE)
> x
    V1
1 str1
2 str2
3 str3
4 str4
5 str5
6 str6
7 str7
>

Note that comment 4 is not read. You can change the comment character # by using the comment.char option.

k.c.
  • 116
  • 1
  • 3
0

You can skip arbitrary lines anywhere in the file if you combine the readLines approach Hong Ooi gives together with negative indexing. Here's an example which skips lines 2-5 in a file that has headers but a number lines of annotation/meta info:

lines <- readLines('myfile.txt')
mytable <- read.table(text = lines[-c(2:5)], sep = '\t', header = T)
posdef
  • 6,498
  • 11
  • 46
  • 94
  • This does not skip the lines. It reads all the lines, and then removes some of them. If the files are large, this is a bad approach. – CoderGuy123 Aug 28 '20 at 01:36