-1

Possible Duplicate:
Only read limited number of columns in R

I have a csv file that is quite large, and so I only want to read the data in R that is relevant. The csv file is 4 columns wide and a several million rows down. But the first column is unnecessary, (as it is a repeated string for every row).

Is there a way to only get the 2nd to 4th columns when reading in the csv file...(its easy enough to remove the original first column post reading it in...but was wondering if there was a more efficient way of doing this).

Community
  • 1
  • 1
h.l.m
  • 13,015
  • 22
  • 82
  • 169
  • 8
    This is documented in `?read.csv`... via `colClasses`. – Joshua Ulrich Jan 25 '13 at 17:49
  • 1
    In my experience, if your csv file is large enough, there is a significant speed up using the system command `cut` if you're on a `*nix` machine. This is especially true with a csv that has many columns when you're selected only a subset. – Justin Jan 25 '13 at 17:54
  • 4
    Asked, and Answered here multiple times: http://stackoverflow.com/q/5936188/429846 and http://stackoverflow.com/q/5788117/429846 and http://stackoverflow.com/q/7714997/429846 , and related http://stackoverflow.com/q/13616985/429846 Please do your homework first! – Gavin Simpson Jan 25 '13 at 18:28

1 Answers1

13

To expand on Joshua's comment:

data <- read.csv("data.csv",colClasses=c("NULL",NA,NA,NA))

"NULL" (note the quotes!) means skip the column, NA means that R chooses the appropriate data type for that column.

Jonathan Christensen
  • 3,756
  • 1
  • 19
  • 16
  • 3
    the `?read.csv` help page says that the `colClasses` vector is recycled, so be careful if you ever use this on tables with more columns. – Señor O Jan 25 '13 at 18:19
  • 1
    I suspect that @h.l.m is confusing rownames and first column values. It may be premature to start guessing at what he really meant without a reproducible example. – IRTFM Jan 25 '13 at 18:24