-1

I'm still learning R and I'm not sure why there is NA data in my graph. Considering that I have used the table function to check the variables in the column. graph

Any suggestions to remove the NA variable in my graph?

Please find below sample of code(not actual dataset):

*Install and load relevant packages

install.packages("tidyverse")
install.packages("lubridate")
install.packages("ggplot2")
install.packages("tibble")

library(tidyverse)
library(lubridate)
library(ggplot2)
library(tibble)
library(dplyr)

*data frame

all_trips <- tribble(~start, ~end, ~start_name, ~type,
                   "2020-03-22 03:20:20", "2020-03-22 04:10:15", "A", "member",
                   "2020-03-25 01:01:07", "2020-03-25 05:09:45", NA, "member",
                   "2020-03-26 07:09:55", "2020-03-26 08:10:20", "B", "casual",
                   "2020-03-29 09:10:30", "2020-03-29 09:00:20", "A", "casual",
                   "2020-03-30 11:09:18", "2020-03-30 03:40:10", "B", "member")

*generate new columns

all_trips$date <- as.Date(all_trips$start) #The default format is yyyy-mm-dd
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")

all_trips$ride_length <- difftime(all_trips$end,all_trips$start)

is.factor(all_trips$ride_length)
all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
is.numeric(all_trips$ride_length)

*data cleaning

all_trips_v2 <- all_trips[!(all_trips$start_name == "NA" | 
                              all_trips$ride_length<0),]

*data viz

all_trips_v2 %>%
  mutate(weekday = wday(start, label = TRUE)) %>% #creates weekday field using wday()
  group_by(type, weekday) %>% #groups by usertype and weekday
  summarise(number_of_rides = n()   #calculates the number of rides and average duration 
            ,average_duration = mean(ride_length)) %>% # calculates the average duration
  arrange(type, weekday)    %>% # sorts
  ggplot(aes(x = weekday, y = number_of_rides, fill = type)) +
  geom_col(position = "dodge", na.rm = TRUE) +
  scale_x_discrete(na.translate = FALSE) 

Bar Chart: Click here

  • To get a usable answer, it would be helpful to to include reproducible sample data and code that produces the ggplot with NA's. – Dan Tarr Apr 23 '22 at 07:43
  • Please provide enough code so others can better understand or reproduce the problem. – Community Apr 24 '22 at 21:32
  • Hi Dan, thanks for the comment. I have added the code in the initial post. Not sure if it is enough but feel free to let me know if you need any further information. Cheers! – Steven Felim Apr 25 '22 at 04:18
  • @StevenFelim Please add the actual code, not a picture of the code. You should provide code as a [mre], so that others can easily replicate your problem and find answers. – David Buck Apr 25 '22 at 06:55
  • The answer below should sufficiently solve issue, try to incorporate into your code. Please copy and paste code as text, then highlight and press "ctrl+k" to format as code (so others can readily access). – Dan Tarr Apr 25 '22 at 06:55
  • @DavidBuck thanks David for the advise. I have added the actual code to my initial post. I only copied and pasted a portion of it as it has 100-ish lines. Please kindly let me know if you have any suggestions. – Steven Felim Apr 25 '22 at 07:31
  • @DanTarr thanks Dan. Adding na.rm and na.translate do make a difference, yet I still can see 'NA' on the legend. Not sure if there is an issue in my cleaning process that trigger this issue. – Steven Felim Apr 25 '22 at 07:34
  • @StevenFelim Please add a small subset of all_trips_v2 data with suspect NA's like here: library(tidyverse) ctable <-tribble(~Area, ~School, ~Coffeeshop, ~Hospitals, ~Parks, ~Total, "Washington", 142, 120, 20, 20, 302, "Seattle", 120, 140, 30, 40, 330) – Dan Tarr Apr 25 '22 at 08:02
  • @DanTarr Hi Dan. I have updated my initial post with more details. I have also added a link to the data set. Hopefully, it will be enough to help you diagnose and replicate my analysis. Thanks heaps. – Steven Felim Apr 26 '22 at 02:51
  • @StevenFelim, Please add minimal reproducible dataset, a small sample that produces the error. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. – Dan Tarr Apr 26 '22 at 04:06
  • @DanTarr Hi Dan, I have updated it.. Seems like there is an issue with my data cleaning. Kindly advise me on this. Thanks! – Steven Felim Apr 27 '22 at 03:15
  • @SteveFelim It works with all_trips, also start_name has NA but not used to produce ggplot – Dan Tarr Apr 27 '22 at 17:08
  • @DanTarr Yes it does exclude NA and ride_length with minus value. But for some reason, when I run View(all_trips_v2), there are three rows with 1 NA row. And you can see the bar above (last sentence of the post). – Steven Felim Apr 28 '22 at 02:03

1 Answers1

0

Adding na.rmand na.translate arguments will remove missing values from bar chart without a warning message as shown here:

tibble(x = rep(c('One', 'Two', 'Two', NA),2), Group=rep(c("A","B"),each=4)) %>% 
ggplot(aes(x, fill=Group)) +
labs(title="Sample Group Bar Chart with NA's Removed") +
geom_bar(stat="Count", position=position_dodge(), na.rm = TRUE) +
scale_x_discrete(na.translate = FALSE)

enter image description here

Dan Tarr
  • 209
  • 3
  • 8
  • Hi, Thank you for your response! Indeed it works! Thank you for the solution. However, do you know how to remove the 'NA' from the legend as well? – Steven Felim Apr 23 '22 at 09:09