32

Data.txt:

Index;Time;
1;2345;
2;1423;
3;5123;

The code:

dat <- read.table('data.txt', skip = 1, nrows = 2, header =TRUE, sep =';')

The result:

  X1 X2345
1  2  1423
2  3  5123

I expect the header to be Index and Time, as follows:

  Index Time
1   2   1423
2   3   5123

How do I do that?

zx8754
  • 52,746
  • 12
  • 114
  • 209
hans-t
  • 3,093
  • 8
  • 33
  • 39
  • 2
    probably duplicate http://stackoverflow.com/questions/15860071/read-csv-skip-second-line – David Arenburg May 08 '14 at 14:13
  • @DavidArenburg indeed is the accepted answer you linked the probably best approach – Beasterfield May 08 '14 at 15:08
  • Have you looked into doing a combination of head() and tail() functions? It might get pretty nested based on how deep you're going, but I believe this will give you what you're looking for. – daneshjai Apr 15 '15 at 11:47

5 Answers5

40

I am afraid, that there is no direct way to achieve this. Either you read the entire table and remove afterwards the lines you don't want or you read in the table twice and assign the header later:

header <- read.table('data.txt', nrows = 1, header = FALSE, sep =';', stringsAsFactors = FALSE)
dat    <- read.table('data.txt', skip = 2, header = FALSE, sep =';')
colnames( dat ) <- unlist(header)
Hack-R
  • 22,422
  • 14
  • 75
  • 131
Beasterfield
  • 7,023
  • 2
  • 38
  • 47
  • 1
    You need to put `, stringsAsFactors=FALSE` in your first line for this to work. – Thomas May 08 '14 at 14:12
  • @Thomas I agree that this should be done, although I do not really see why it _has_ to be done. At least I do not have an example at hand where this would be necessary. – Beasterfield May 08 '14 at 15:17
  • This code does not work with the OP's example file without it...at least not on my machine. – Thomas May 08 '14 at 15:27
  • 1
    @Thomas you are right. The reason is (and I am sure you know that) that in the OP's example all lines end with a `;` giving a missing column and column names containing an `NA`. This indeed makes a problem when calling `unlist(header)`. – Beasterfield May 08 '14 at 16:22
  • fwiw I used `as.is=T` instead of `stringsAsFactors=FALSE` and it seems to have worked just the same. – airstrike Mar 06 '16 at 08:34
8

You're using skip incorrectly. Try this:

dat <- read.table('data.txt', nrows = 2, header =TRUE, sep =';')[-1, ]
Thomas
  • 43,637
  • 12
  • 109
  • 140
Andrew Cassidy
  • 2,940
  • 1
  • 22
  • 46
  • That will work if I've got small data sets. But, say, I want to skip 600000 lines and get the first row as my column names. Your code will waste a lot of memory. – hans-t May 08 '14 at 14:08
  • This still doesn't give the desired output for me, I get only `Index 2 Time 1423` – Csislander May 08 '14 at 14:09
3

The solution using fread from data.table.

require(data.table)
fread("Data.txt", drop = "V3")[-1]

Result:

> fread("Data.txt", drop = "V3")[-1]
   Index Time
1:     2 1423
2:     3 5123
djhurio
  • 5,437
  • 4
  • 27
  • 48
3

Instead of read.table(), use a readr function such as read_csv(), piped to dplyr::slice().

library(readr)
library(dplyr)
dat <- read_csv("data.txt") %>% slice(-1)

It's very fast too.

Joe
  • 8,073
  • 1
  • 52
  • 58
  • 1
    I just discovered this and have a similar situation. How does `readr`'s column specification work since the second row is a different format than the rest of the data? Is there a good way to assign the column types after importing the data? – Andrew Jackson Feb 01 '17 at 17:26
1

You could (in most cases), sub out the ending ; write a new file without the second row (which is really the first row because of the header), and use read.csv instead of read.table

> txt <- "Index;Time;
  1;2345;
  2;1423;
  3;5123;" 
> writeLines(sub(";$", "", readLines(textConnection(txt))[-2]), 'newTxt.txt')
> read.csv('newTxt.txt', sep = ";")
##   Index Time
## 1     2 1423
## 2     3 5123
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
  • 1
    This is very inefficient for large files. I tried this recently, and it was very slow for some smaller files, but it never completed for larger ones. – Max Candocia Aug 22 '17 at 19:48