I have a data set of around 70k obs. and I want to plot them in a x axis with 5(or more) different factors and wrap them through three types of different severity.
The main problem is that the majority of obs are gathered in 1 factor (severity =3 ) so i can't even read the other 2. ylim doesn't help me because it actually changes the results completely instead of make them a percentage.
Should I do the separation by myself? Or is there any command that could do that for me?
I am attaching below an image to make my problem more comprehensive.
I want to judge each factor based on severity.
Here is the sample of the code.
acc.10 <- read.csv("Accidents2010.csv")
install.packages("ggplot2")
library(ggplot2)
install.packages("stringr")
library(stringr)
acc.10$Road_Type <- as.factor(acc.10$Road_Type)
acc.10$X1st_Road_Class <- as.factor(acc.10$X1st_Road_Class)
ggplot(acc.10, aes(x = Road_Type )) +
geom_bar(width = 0.4) +
ggtitle("Accidents based on Road Type") +
xlab("Road Type")
ggplot(acc.10, aes(x = acc.10$X1st_Road_Class )) +
geom_bar(width = 0.4) +
ggtitle("Accidents based on 1st Road Class") +
xlab("1st Road Class")
data.10 <- acc.10[which(acc.10$X1st_Road_Class == 3),]
#we will check light conditions in order to
data.10$Light_Conditions <- as.factor(data.10$Light_Conditions)
#we plot to see the distribution
ggplot(data.10, aes(x = Light_Conditions)) +
geom_bar(width = 0.5) +
ggtitle("Accidents based on Light Conditions") +
xlab("Light Conditions")
ggplot(data.10[which(as.numeric(data.10$Accident_Severity) == 3),]
, aes(x = Light_Conditions)) +
geom_bar(width = 0.5) +
ggtitle("Accidents based on Light Conditions") +
xlab("Light Conditions")
#We drill harder to see if there are connections of survivability
data.10$Accident_Severity <- as.factor(data.10$Accident_Severity)
ggplot(data.10, aes(x = Light_Conditions, fill = Accident_Severity)) +
geom_bar(width = 0.5) +
ggtitle("Accidents based on Light Conditions and Survivability") +
xlab("Light Conditions")
# We will try to wrap them based on severity instead of the bar graph
ggplot(data.10, aes (x = Light_Conditions)) +
geom_bar(width = 0.5) +
ggtitle("Accident seperated by severity affected of Light Conditions") +
facet_wrap(~Accident_Severity) +
xlab("Light Conditions") +
ylab("Total Count")
And the file with data is here: https://data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data/datafile/4c03ef8d-992d-44df-8543-412d23f3661b/preview