0

I have a data set of around 70k obs. and I want to plot them in a x axis with 5(or more) different factors and wrap them through three types of different severity.

The main problem is that the majority of obs are gathered in 1 factor (severity =3 ) so i can't even read the other 2. ylim doesn't help me because it actually changes the results completely instead of make them a percentage.

Should I do the separation by myself? Or is there any command that could do that for me?

I am attaching below an image to make my problem more comprehensive.

https://i.imgur.com/zYaHom6.png

I want to judge each factor based on severity.

Here is the sample of the code.

acc.10 <- read.csv("Accidents2010.csv")

install.packages("ggplot2")
library(ggplot2)
install.packages("stringr")
library(stringr)

acc.10$Road_Type <- as.factor(acc.10$Road_Type)
acc.10$X1st_Road_Class <- as.factor(acc.10$X1st_Road_Class)

ggplot(acc.10, aes(x = Road_Type )) +
  geom_bar(width = 0.4) +
  ggtitle("Accidents based on Road Type") +
  xlab("Road Type")

ggplot(acc.10, aes(x = acc.10$X1st_Road_Class )) +
  geom_bar(width = 0.4) +
  ggtitle("Accidents based on 1st Road Class") +
  xlab("1st Road Class")

data.10 <- acc.10[which(acc.10$X1st_Road_Class == 3),]

#we will check light conditions in order to 
data.10$Light_Conditions <- as.factor(data.10$Light_Conditions)

#we plot to see the distribution
ggplot(data.10, aes(x = Light_Conditions)) +
  geom_bar(width = 0.5) +
  ggtitle("Accidents based on Light Conditions") +
  xlab("Light Conditions")

ggplot(data.10[which(as.numeric(data.10$Accident_Severity) == 3),]
, aes(x = Light_Conditions)) +
  geom_bar(width = 0.5) +
  ggtitle("Accidents based on Light Conditions") +
  xlab("Light Conditions")

#We drill harder to see if there are connections of survivability

data.10$Accident_Severity <- as.factor(data.10$Accident_Severity)

ggplot(data.10, aes(x = Light_Conditions, fill = Accident_Severity)) +
  geom_bar(width = 0.5) +
  ggtitle("Accidents based on Light Conditions and Survivability") +
  xlab("Light Conditions")

# We will try to wrap them based on severity instead of the bar graph

ggplot(data.10, aes (x = Light_Conditions)) +
  geom_bar(width = 0.5) +
  ggtitle("Accident seperated by severity affected of Light Conditions") +
  facet_wrap(~Accident_Severity) +
  xlab("Light Conditions") +
  ylab("Total Count")

And the file with data is here: https://data.gov.uk/dataset/cb7ae6f0-4be6-4935-9277-47e5ce24a11f/road-safety-data/datafile/4c03ef8d-992d-44df-8543-412d23f3661b/preview

ThomasTas
  • 1
  • 5

1 Answers1

0

Thanks a lot to @Peter K his solution worked

It is not in percentage the y axis but it does not really matter because the data now

are clearly readable.

I set the sample code

ggplot(data.10, aes (x = Light_Conditions)) +
  geom_bar(width = 0.5) +
  ggtitle("Accident seperated by severity affected of Light Conditions") +
  facet_wrap(~Accident_Severity, scales = 'free_y') +
  xlab("Light Conditions") +
  ylab("Total Count")

the command facet_wrap(~Accident_Severity, scales = 'free_y') solved the problem

https://i.stack.imgur.com/mAIS3.png

The photo is above but i dont have the reputation to post it. Thanks a lot again.

ThomasTas
  • 1
  • 5