0

I have a large txt file (approx 2 mil rows). First column is a Date in format 01/01/2006. Values are separated with a ;

data <- read.table("largeFile.txt", sep=";")

dataToUse <- data[data$Date >= 01/02/2007 && data$Date <= 02/02/2007,]

example row:

16/12/2006;17:36:00;5.224;0.478;232.990;22.400;0.000;1.000;16.000

Above code also doesn't work, but is there a way to subset first and then load data into the data variable ? Since the file is large, and it takes some time to load it ?

Mefhisto1
  • 2,188
  • 7
  • 34
  • 73
  • First of all, study `help("&&")`to learn why you should use `&` instead. Then have a look at the `fread` function in package data.table for faster import. – Roland Jun 08 '14 at 08:54
  • Also, have a look at http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r/1820610 – nico Jun 08 '14 at 09:24

1 Answers1

2

For the subset to work, you need quotes and one less ampersand.

dataToUse <- data[data$Date >= "01/02/2007" & data$Date <= "02/02/2007", ]

You could also use the subset() function.

subset(data, Date >= "01/02/2007" & Date <= "02/02/2007")

Next, if the date column should be a Date class variable, you can set its class with the argument colClasses in read.table(). You can set all the column classes this way if you so choose, or just one use. Just make sure your dates are in the proper format before using colClasses for dates class variables.

Finally, to subset the data before you read it onto R, I would recommend using shell/unix commands in a terminal or shell. Functions such as grep, awk, sed, etc are easy and quick to reduce data before sending it to R. On Windows I'd recommend you download Cygwin (it's free and fast), and obviously just the terminal in linux-based machines.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245