1

I have a data set for motor vehicle crashes happening daily in NYC from 1 Jan 2014 to 31 Dec 2012. I want to plot time series of the number of injured cyclists, and motorists, monthly in a single plot.

My data looks like this:

    Date      Time   Location   Cyclists injured  Motorists injured
2014-1-1     12:05      Bronx                  0                  1
2014-1-1     12:34      Bronx                  1                  2
2014-1-2      6:05      Bronx                  0                  0
2014-1-3      8:01      Bronx                  1                  2
2014-1-3     12:05  Manhattan                  0                  1
2014-1-3     12:56  Manhattan                  0                  2

and so on till 31 Dec 2014.

Now to plot monthly time series for this, I understand I first need to total the each of the sums for each month, and then plot the monthly totals. But I do not know how I can do this.

I used the aggregate function using this code, however it gives me sum for each day and not month. Please help.

cyclist <- aggregate(NUMBER.OF.CYCLIST.INJURED ~ DATE, data = final_data,sum)

Thank you :)

micstr
  • 5,080
  • 8
  • 48
  • 76
Mannat M
  • 167
  • 1
  • 3
  • 11
  • 3
    Try `%Y` instead of `%y`. – David Arenburg Apr 27 '15 at 20:59
  • No, its still giving the same wrong results – Mannat M Apr 27 '15 at 21:18
  • 1
    I don't think so. `as.Date("1/1/2014" , "%m/%d/%Y")` works just fine. – David Arenburg Apr 27 '15 at 21:25
  • 2
    Please be more specific. (1) show the wrong result, how you got it and what you expected. (2) provide your data in a reproducible form by showing the output of, say, `dput(head(final_data))` (3) The question asks for a pedestrain time series but is no pedestrian data in your data frame. (4) are you looking to sum each numeric column by Date and then plot the sums against Date ignoring the Time and Location columns? – G. Grothendieck Apr 27 '15 at 22:45
  • @MannatM: You offer two different date formats: `1/19/2014` and `31 dec 2014`. Which one is it? – IRTFM Apr 27 '15 at 23:06
  • Do you have some weird text in the date frame or missing values? – Chris Bail Apr 27 '15 at 21:23
  • Why didn't you correct the error in your format string, show you plotting code and report any error messages or erroneous results in full thereafter? – IRTFM Apr 28 '15 at 06:18
  • 1
    Mannat, you need a new field which just has month of the data like Jan which you can then aggregate on. See my answer below where I create a PlotDate to help you with this – micstr Apr 28 '15 at 06:51
  • @BondedDust using the instructions given by you guys, I plotted it again. It works fine now. And to be honest, I didn't change anything. I still used the same code again. Reloaded the dataset and it worked, thats why I couldn't attach any image of wrong answer and so edited my question. – Mannat M Apr 28 '15 at 06:52
  • @Mannat, [tag:R] is case sensitive so be careful with your variable names - Date and DATE - are not the same variable. – micstr Apr 28 '15 at 07:33
  • @micstr Yes thank you. But DATE in my case is the name of the column and Date is a r keyword. – Mannat M Apr 28 '15 at 07:50
  • Your listing of the data structure has "Date" as a column name and your code uses "DATE". It's true that "Date" is an R class name but that would not have cause an error whereas spelling the column names as "DATE" when it really was "Date" would have caused an error. – IRTFM Apr 28 '15 at 14:18

1 Answers1

4

Mannat here is an answer using data.table package to help you aggregate. Use install.packages(data.table) to first get it into your R.

library(data.table)

# For others
#   I copied your data into a csv file, Mannat you will not need this step,
#   other helpers look at data in DATA section below 
final_data <- as.data.table(read.csv(file.path(mypath, "SOaccidents.csv"),
                                     header = TRUE,
                                     stringsAsFactors = FALSE))
# For Mannat
# Mannat you will need to convert your existing data.frame to data.table
final_data <- as.data.table(final_data)

# check data formats, dates are strings 
# and field is Date not DATE
str(final_data)

final_data$Date <- as.Date(final_data$Date, "%m/%d/%Y")

# use data table to aggregate on months 
# First lets add a field plot date with Year and Month YYYYMM 201401
final_data[, PlotDate := as.numeric(format(Date, "%Y%m"))] 

# key by this plot date
setkeyv(final_data, "PlotDate")

# second we aggregate with by , and label columns
plotdata <- final_data[, .(Cyclists.monthly  = sum(Cyclists.injured), 
                           Motorists.monthly = sum(Motorists.injured)), by = PlotDate]

#   PlotDate Cyclists.monthly Motorists.monthly
#1:   201401                2                 8

# You can then plot this (makes more sense with more data)
# for example, for cyclists
plot(plotdata$PlotDate, plotdata$Cyclists.monthly)

Mannat if you are not familiar with data.table, please see the cheatsheet

DATA

For others looking to work on this. Here is result from dput:

final_data <- data.table(Date = c("01/01/2014", "01/01/2014", "01/01/2014", 
                        "01/01/2014", "1/19/2014", "1/19/2014"), 
                        Time = c("12:05", "12:34","06:05", "08:01", "12:05", "12:56"),
                        Location = c("Bronx", "Bronx","Bronx", "Bronx", 
                            "Manhattan", "Manhattan"),
                        Cyclists.injured = c(0L, 1L, 0L, 1L, 0L, 0L),
                        Motorists.injured = c(1L, 2L, 0L, 2L, 1L, 2L))

PLOTS

Either use ggplot2 package

or for plots please see Plot multiple lines (data series) each with unique color in R for plotting help.

# I do not have your full data so one point line charts not working
# I needed another month for testing, so added a fake February
testfeb <- data.table(PlotDate = 201402, Cyclists.monthly = 4,
                      Motorists.monthly = 10)
plotdata <- rbindlist(list(plotdata, testfeb))

# PlotDate  Cyclists.monthly    Motorists.monthly
#1  201401                 2                    8
#2  201402                 4                   10

# Plot code, modify the limits as you see fit
plot(1, type = "n",
     xlim = c(201401,201412), 
     ylim = c(0, max(plotdata$Motorists.monthly)),
     ylab = 'monthly accidents',
     xlab = 'months')

lines(plotdata$PlotDate, plotdata$Motorists.monthly, col = "blue")
lines(plotdata$PlotDate, plotdata$Cyclists.monthly, col = "red")

# to add legend
legend(x = "topright", legend = c("Motorists","Cyclists"),
       lty=c(1,1,1), lwd=c(2.5,2.5,2.5), 
       col=c("blue", "red"))
# or set legend inset x to another position e.g. "bottom" or "bottomleft"

Accident Plot Example with Legend

Community
  • 1
  • 1
micstr
  • 5,080
  • 8
  • 48
  • 76
  • Thank you for the help. But I encounter the following error : Error in `[.data.frame`(final_data, , `:=`(PlotDate, as.numeric(format(DATE, : could not find function ":=" – Mannat M Apr 28 '15 at 07:02
  • 1
    1) Did you install.packages(data.table)? 2) you need to load data.table with library command 3) You need to convert your data frame to data.table e.g. final_data <- as.data.table(final_data) - which I have added to answer – micstr Apr 28 '15 at 07:05
  • How can I make it a line of different colors instead of dots.? – Mannat M Apr 28 '15 at 07:36
  • Hey Micstr I see from here thatwe can plot the plots separetely for each pf em. However Im looking to plot the monthly time series for each of them in the same plot. Can you please help with that. – Mannat M Apr 28 '15 at 07:52
  • Please look at ?plot to understand the plot function. To do a line `plot(x, y, type = "l", col = "red")` – micstr Apr 28 '15 at 07:56
  • Im trying to add a legend to my graph and I used the following code. : legend(2000,9.5, c("Motorists","Cyclists","Pedestrians"), lty=c(1,1,1), lwd=c(2.5,2.5,2.5),col=c("blue","red","dark green")) However, its not showing on the graph. – Mannat M Apr 28 '15 at 21:56
  • Please see updated chart for working legend. Your 2000 and 9.5 refers to pixel positions. I used the simpler inset logic. See in help `?legend` and look at x (third paragraph in details of help). – micstr Apr 30 '15 at 06:07