1

thanks in advance for your time on reading and answering this. I have a data frame (15264*3) the head of which is:

head(actData)
      steps  date      interval
289     0 2012-10-02        0
290     0 2012-10-02        5
291     0 2012-10-02       10
292     0 2012-10-02       15
293     0 2012-10-02       20
294     0 2012-10-02       25

There are 53 of the "date" variable (factor); I want to split the data based on date, calculate the mean of the steps/date and then create a plot for interval vs. steps' mean; What I have done:

mn<- ddply(actData, c("date"), function (x) apply(x[1], 2, mean)) # to calculate mean of steps per day (with the length of 53)
splt<- split(actData, actData$date) # split the data based on date (it should divide the data into 53 parts)

Now I have two variables with the same length (53); but when I try plotting them, I get an error for the difference in their length:

plot(splt$interval, mn[,2], type="l")  
Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ

when I check the length of splt$interval, it gives me "0"! I've also visited here "How to split a data frame by rows, and then process the blocks?", "Split data based on column values and create scatter plot." and so on... with a lot of good suggestions but none of them addresses my questions! Sorry if my question is a little stupid, I am not an expert in R :)

I am using windows 7, Rstudio 3.0.1. Thanks.

EDIT:

head(splt, 2)
$`2012-10-01`
[1] steps    date     interval
<0 rows> (or 0-length row.names)

$`2012-10-02`
     steps   date     interval
289     0 2012-10-02        0
290     0 2012-10-02        5
291     0 2012-10-02       10
292     0 2012-10-02       15 

head(mn)
    date    steps
1 2012-10-02  0.43750
2 2012-10-03 39.41667
3 2012-10-04 42.06944
4 2012-10-05 46.15972
5 2012-10-06 53.54167
6 2012-10-07 38.24653
Community
  • 1
  • 1
  • It could be something to do with `levels` but could you print the head of splt and mn or better (IMO) their `dput`? – llrs Feb 19 '14 at 13:38
  • `splt$interval` is NULL, because `splt` is a list that contains data frames, so you'd have to do something like `splt[[1]]$interval` to get the intervals. That said, do you want to plot the mean steps vs. the mean interval? Or for every interval in a given day, that value against the mean of steps for that day? It seems like you need to summarize `interval` as well as `steps`. – BrodieG Feb 19 '14 at 13:42

1 Answers1

0

I want to split the data based on date, calculate the mean of the steps/date and then create a plot for interval vs. steps' mean;

After step 2, you will have a matrix like this:

      mean(steps)  date     
289     0.23     2012-10-02
290     0.42     2012-10-03
291     0.31     2012-10-04

You want to plot this against "the intervals", but there are also multiple intervals per 'date'. What are you exactly trying to plot in x vs y?

  • The mean steps per date?
  • The mean steps vs mean intervals (i.e. an x-y point per date)?
parasietje
  • 1,529
  • 8
  • 36
  • I am asked to make a plot of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)... obviously, I cannot make a plot of mean steps vs. the whole intervals because of the length difference, so making plot of the mean steps vs. intervals which are splited baed on date is reasonable! – user2888374 Feb 19 '14 at 15:28
  • Actually I have to create a time series plot of them, so maybe this code addresses my question: xyplot(actData$steps~ actData$interval | actData$date, type= "l"). right? but it still does not consider the funcion mean for steps!! – user2888374 Feb 19 '14 at 17:14