How can I read the header but also skip lines - read.table()?

Question

Data.txt:

Index;Time;
1;2345;
2;1423;
3;5123;

The code:

dat <- read.table('data.txt', skip = 1, nrows = 2, header =TRUE, sep =';')

The result:

  X1 X2345
1  2  1423
2  3  5123

I expect the header to be Index and Time, as follows:

  Index Time
1   2   1423
2   3   5123

How do I do that?

probably duplicate http://stackoverflow.com/questions/15860071/read-csv-skip-second-line — David Arenburg, May 08 '14 at 14:13
@DavidArenburg indeed is the accepted answer you linked the probably best approach — Beasterfield, May 08 '14 at 15:08
Have you looked into doing a combination of head() and tail() functions? It might get pretty nested based on how deep you're going, but I believe this will give you what you're looking for. — daneshjai, Apr 15 '15 at 11:47

score 40 · Accepted Answer · edited Jul 04 '18 at 21:52

40

I am afraid, that there is no direct way to achieve this. Either you read the entire table and remove afterwards the lines you don't want or you read in the table twice and assign the header later:

header <- read.table('data.txt', nrows = 1, header = FALSE, sep =';', stringsAsFactors = FALSE)
dat    <- read.table('data.txt', skip = 2, header = FALSE, sep =';')
colnames( dat ) <- unlist(header)

edited Jul 04 '18 at 21:52

Hack-R

22,422
14
75
131

answered May 08 '14 at 14:10

Beasterfield

7,023
2
38
47

1

You need to put `, stringsAsFactors=FALSE` in your first line for this to work. – Thomas May 08 '14 at 14:12
@Thomas I agree that this should be done, although I do not really see why it _has_ to be done. At least I do not have an example at hand where this would be necessary. – Beasterfield May 08 '14 at 15:17
This code does not work with the OP's example file without it...at least not on my machine. – Thomas May 08 '14 at 15:27
1

@Thomas you are right. The reason is (and I am sure you know that) that in the OP's example all lines end with a `;` giving a missing column and column names containing an `NA`. This indeed makes a problem when calling `unlist(header)`. – Beasterfield May 08 '14 at 16:22
fwiw I used `as.is=T` instead of `stringsAsFactors=FALSE` and it seems to have worked just the same. – airstrike Mar 06 '16 at 08:34

score 8 · Answer 2 · edited May 08 '14 at 14:06

8

You're using skip incorrectly. Try this:

dat <- read.table('data.txt', nrows = 2, header =TRUE, sep =';')[-1, ]

edited May 08 '14 at 14:06

Thomas

43,637
12
109
140

answered May 08 '14 at 14:04

Andrew Cassidy

2,940
1
22
46

That will work if I've got small data sets. But, say, I want to skip 600000 lines and get the first row as my column names. Your code will waste a lot of memory. – hans-t May 08 '14 at 14:08
This still doesn't give the desired output for me, I get only `Index 2 Time 1423` – Csislander May 08 '14 at 14:09

score 3 · Answer 3 · answered May 08 '14 at 14:58

3

The solution using fread from data.table.

require(data.table)
fread("Data.txt", drop = "V3")[-1]

Result:

> fread("Data.txt", drop = "V3")[-1]
   Index Time
1:     2 1423
2:     3 5123

answered May 08 '14 at 14:58

djhurio

5,437
4
27
48

score 3 · Answer 4 · answered Oct 26 '16 at 12:12

3

Instead of read.table(), use a readr function such as read_csv(), piped to dplyr::slice().

library(readr)
library(dplyr)
dat <- read_csv("data.txt") %>% slice(-1)

It's very fast too.

answered Oct 26 '16 at 12:12

Joe

8,073
1
52
58

1

I just discovered this and have a similar situation. How does `readr`'s column specification work since the second row is a different format than the rest of the data? Is there a good way to assign the column types after importing the data? – Andrew Jackson Feb 01 '17 at 17:26

score 1 · Answer 5 · answered May 08 '14 at 16:44

1

You could (in most cases), sub out the ending ; write a new file without the second row (which is really the first row because of the header), and use read.csv instead of read.table

> txt <- "Index;Time;
  1;2345;
  2;1423;
  3;5123;" 
> writeLines(sub(";$", "", readLines(textConnection(txt))[-2]), 'newTxt.txt')
> read.csv('newTxt.txt', sep = ";")
##   Index Time
## 1     2 1423
## 2     3 5123

answered May 08 '14 at 16:44

Rich Scriven

97,041
11
181
245

1

This is very inefficient for large files. I tried this recently, and it was very slow for some smaller files, but it never completed for larger ones. – Max Candocia Aug 22 '17 at 19:48

How can I read the header but also skip lines - read.table()?

5 Answers5

Linked

Related