0

I have a dataset from dot.gov website that I have to analyze as part of our school project. It contains a lot of information, but I am just focusing on crashes and injuries. How do I count the number of crashes or injuries from the year 2007-2014 for example?

Do I have to subset my data per year or is there a more efficient way to do it? Thank you!

Below is a sample of my dataset: Sample dataset

Pinaypy
  • 37
  • 1
  • 8
  • Instead of pasting an image of our dataset, can you provide a reproducible example of your dataset using `dput` as described in this link: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example ? – dc37 Mar 15 '20 at 18:14

1 Answers1

1

Without a reproducible example of your dataset on which we can test our code, it is difficult to be sure that it will be working, but using dplyr and lubridate package, you can try (assuming that your dataset is called df):

library(dplyr)
library(lubridate)
df %>% mutate(YEARTXT = ymd(YEARTXT)) %>%
  mutate(Year = year(YEARTXT)) %>%
  filter(Year %in% 2007:2014) %>%
  summarise(INJURED = sum(INJURED, na.rm = FALSE),
            CRASH = sum(CRASH == "Y"))

To get the count of Crash and injured by per year, you can add group_by to the following sequence such as:

df %>% mutate(YEARTXT = ymd(YEARTXT)) %>%
  mutate(Year = year(YEARTXT)) %>%
  group_by(Year) %>%
  filter(Year %in% 2007:2014) %>%
  summarise(INJURED = sum(INJURED, na.rm = FALSE),
            CRASH = sum(CRASH == "Y"))

If this is not working, please provide a reproducible example of your dataset: How to make a great R reproducible example

dc37
  • 15,840
  • 4
  • 15
  • 32
  • This worked! I have an additional question, how do I get my total crash and injuries per year using code similar to this? Thank you! – Pinaypy Mar 16 '20 at 00:04
  • You're welcome ;). I edited my answer to provide you a way to get the count per year. Let me know if it is working. – dc37 Mar 16 '20 at 00:09