0

For a certain project I have a script that takes dates from a csv file:

reviewFile = "csvfile"
reviews = read.csv(reviewFile, stringsAsFactors=FALSE, sep = '\t')
names(reviews) <- c("date","rating","helpful","total","title","text")

reviews$date <- as.Date(reviews$date, "%Y-%m-%d") 
reviews$month <- as.Date(cut(reviews$date, breaks = "month"))

aggr <- aggregate(reviews$rating ~ reviews$month, FUN = mean) 
aggr$freq = count(reviews$month)$freq 
names(aggr) <- c("month","avg_rating","num_ratings") 

par(mfrow=c(2,1))

plot(aggr$month, aggr$num_ratings, type="s", xlab="Month", ylab="# ratings")

As you can see, this takes a csv file and plots it into a graph. How do I implement a for() loop here so that only reviews from 2010 (or 2010-01) and up e.g. get implemented in the graph?


Below is an example of a data input. They all have the same build up. I am new to R, do i have to make an if statement in the date variable?

2003-09-30  5.0 1   2   2-Fast-2-Furious-(Widescreen-Edition)   THIS MOVIE KEEPS U ON THE EDGE OF U'R SEAT BUT AT THE SAME TIME MAKES U LAUGH.. I CAN'T C Y PEEP'S DITCH THIS MOVIE OR THE FIRST UNLESS THEY WERE BORN UNDER A ROCK.... PAUL WALKER RETURNS & DOES AN X-CELENT JOB ... TYRESE A NEW UPCOMING STAR ALSO PROVIDES AN X-CELENT JOB AS HIS CO-PARTNER 
Bas
  • 1,066
  • 1
  • 10
  • 28
Martijn
  • 3
  • 5
  • 1
    You don't need a `for` loop. Just index the data using the `date` variable to find `date`s that are greater than or equal to the date you want to start at. If you couple that with the `formula` method for `plot` (`plot(num_ratings ~ month, data = aggr)`) then you could use that method's `subset` argument to do the subsetting directly. But without a reproducible example (i.e. data I can work with) you;ll have to solve this problem yourself. – Gavin Simpson Oct 22 '15 at 22:32
  • 2003-09-30 5.0 1 2 2-Fast-2-Furious-(Widescreen-Edition) THIS MOVIE KEEPS U ON THE EDGE OF U'R SEAT BUT AT THE SAME TIME MAKES U LAUGH.. I CAN'T C Y PEEP'S DITCH THIS MOVIE OR THE FIRST UNLESS THEY WERE BORN UNDER A ROCK.... PAUL WALKER RETURNS & DOES AN X-CELENT JOB ... TYRESE A NEW UPCOMING STAR ALSO PROVIDES AN X-CELENT JOB AS HIS CO-PARTNER . This is an example of a data input. They all have the same build up. I am new to R, do i have to make an if statement in the date variable? – Martijn Oct 22 '15 at 22:38
  • @Martijn Instead of leaving your example data in a comment, please edit it into the question itself with the "edit" button. You can read about how to share reproducible examples for [r] questions at http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – josliber Oct 22 '15 at 22:39

1 Answers1

2
# build some example data
dat <- data.frame(date= as.Date(12000:12500, origin = "1970-01-01"),
                  val= rnorm(501))
dat$mo <- as.Date(cut(dat$date, breaks= "month"))

# use OP's aggregation on ex data
aggr <- aggregate(dat$val ~ dat$mo, FUN = mean) 
aggr$freq = as.integer(table(dat$mo))
names(aggr) <- c("month","avg_rating","num_ratings") 

par(mfrow= c(1,2))
# base plot
plot(num_ratings ~ month, data= aggr, 
     type="s", xlab="Month", ylab="# ratings")

# subset based on date
plot(num_ratings ~ month, data= aggr[aggr$month >= as.Date("2003-01-01", format= "%Y-%m-%d"),], 
     type="s", xlab="Month", ylab="# ratings")

Left: original plot, right: subset plot enter image description here

alexwhitworth
  • 4,839
  • 5
  • 32
  • 59