3

I want to subset my data set between two distinct dates. I've loaded the data from a text file to R with ';' as separators.

x <- read.table("household_power_consumption.txt", sep = ";", header = TRUE)

head(x)

gives me this:

[head(x)][1]

The data set contains over 200000 lines so I need to subset the data of only two particular dates. So I tried this:

x[Date >= as.Date("2007-02-01") | Date <= as.Date("2007-02-02")]

But I see the following error:

Error in `[.data.frame`(x, Date >= as.Date("2007-02-01") | Date <= as.Date("2007-02-02")) : object 'Date' not found

So what is the problem here? How do I subset the data?

  • 2
    try `x[x$Date >= as.Date("2007-02-01") & x$Date <= as.Date("2007-02-02"),]` – RHertel Feb 07 '16 at 20:15
  • 2
    By default, `[.data.frame` does not operate within its frame. You need to replace `Date` with `x$Date`, or switch to, e.g., `data.table`, which _does_ operate within frame, or use `with`. – MichaelChirico Feb 07 '16 at 20:16
  • Also, 200,000 lines is getting into [`fread` territory](http://stackoverflow.com/a/15058684/3576984). – MichaelChirico Feb 07 '16 at 20:17

1 Answers1

2

There are a couple of problems in your code.

  1. As discussed in the comments, the subsetting of the data frame requires a different approach. Maybe the simplest one is to use x$Date instead of Date.
  2. You want to select two specific dates. For this you can either use

    x$Date == as.Date("2007-02-01") | x$Date == as.Date("2007-02-02")
    

    (connected with a logical OR), or

    x$Date >= as.Date("2007-02-01") & x$Date <= as.Date("2007-02-02")
    

    (connected with a logical AND). The version in your code selects any possible date and is therefore not useful.

  3. You did not specify the column(s) that you want to select. The purpose, I assume, is to select the entire row of the entries that correspond to your selection criterion. For this, you need to add a comma at the end, before closing the square bracket.

edit

Not knowing the format in which the column x$Date is stored, it may be helpful to wrap that content into as.Date(), too.

In summary, this should probably work:

x[as.Date(x$Date) >= as.Date("2007-02-01") & as.Date(x$Date) <= as.Date("2007-02-02"),]
RHertel
  • 23,412
  • 5
  • 38
  • 64
  • I have tried this but this is the warning message that I see: incompatible methods ("Ops.factor", "Ops.Date") for the ">=" ; Incompatible methods ("Ops.factor", "Ops.Date") for the "<=" –  Feb 07 '16 at 20:41
  • You didn't post your data; it might help to know the output of `class(x$Date)`. But you could probably resolve this by wrapping the date into `as.Date()`, like `as.Date(x$Date) >= as.Date("2004-02-01)...` – RHertel Feb 07 '16 at 20:47
  • Alternatively, you can try to use the option `stringsAsFactors=FALSE` when reading the file with `read.csv()`. But I find that the version with `as.Date()` on either side of the `>=` and `=<` sign is better. – RHertel Feb 07 '16 at 20:53
  • Had to add the factor parameter in the as.Date() function like as.Date(x$Date, factor = "%d/%m/%Y") >= as.Date("2007-02-01").. Thanks! –  Feb 07 '16 at 20:56
  • You're welcome...I assume it was `format= "%d/%m/%Y"`. If the answer was useful to solve your problem, please consider accepting it by clicking on the tick on the left. – RHertel Feb 07 '16 at 20:58