1

I have a dataset with dates in one field and N/As in another. I created this as a subset of a larger dataset because I need to see whether the number of N/As are from one time period or more evenly distributed across all time.

my data looks like this:

User_id |    Date    | app_version
001     | 2016-01-03 | <NA>
002     | 2016-03-03 | <NA>
003     | 2016-02-22 | <NA>
004     | 2016-04-15 | <NA>
...

What I'd like to do is plot a line graph with time on the X axis and number of NAs on the Y axis.

Thanks in advance.

jceg316
  • 469
  • 1
  • 9
  • 17
  • 2
    Fix your data first, e.g. `library(tidyverse); df %>% group_by(Date) %>% summarise(app_version = sum(is.na(app_version))) %>% ggplot(aes(Date, app_version)) + geom_line()` – alistaire Jan 25 '18 at 16:28

3 Answers3

1

Using dplyr and ggplot2: Group your data accordingly, summarize and count the number of NA values, then plot. (In this case, I grouped by Date and added geom_point to show each date.)

library(dplyr)
library(ggplot2)

df %>% 
  group_by(Date) %>% 
  summarize(na_count = sum(is.na(app_version))) %>% 
  ggplot(aes(x = Date, y = na_count)) +
  geom_line() +
  geom_point()

enter image description here

Jake Kaupp
  • 7,892
  • 2
  • 26
  • 36
  • Thanks for the reply, however i'm getting this error message: `Error in grouped_df_impl(data, unname(vars), drop) : Column `install_date` is of unsupported class POSIXlt/POSIXt` – jceg316 Jan 25 '18 at 16:42
  • Sounds like you need to properly format `install_date`, try using `as.Date` or `as.POSIXct`. See: https://stackoverflow.com/questions/30063190/problems-with-dplyr-and-posixlt-data – Jake Kaupp Jan 25 '18 at 17:02
  • Thanks, I was using `as.POSIXct()` but on the the wrong df :S. It's working now. – jceg316 Jan 25 '18 at 17:08
0

Your db

User_id<-c("001","002","003","004")
Date<-c("2016-01-03","2016-03-03","2016-02-22","2016-04-15")
app_version<-c(NA,NA,NA,NA)

db<-data.frame(cbind(User_id,Date,app_version))

Your graph

plot(table(db[is.na(db$app_version),"Date"]),type="l")

Your plot

Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
  • Thanks for the reply, however I'm getting this error: `Error in table(na_df[is.na(na_df$app_version), "install_date"]) : 'names' attribute [75] must be the same length as the vector [11]` – jceg316 Jan 25 '18 at 16:47
  • Please, add more details about the code generating thies error – Terru_theTerror Jan 25 '18 at 16:57
0
library(plyr)
#create a field that breaks the dates down to just year & month
#You can break it down by year if you'd like
df$yr_mth<-substr(df$Date, 1, 7)
#summarize the number of NAs per year_month 
df1<-ddply(df, .(yr_mth), summarize, 
    num_na=length(which(is.na(app_version))))
#plot yr_mth on x, num_na on y
ggplot(data=df1, aes(x=as.Date(yr_mth), y=num_na))+
    geom_point()
ITM
  • 73
  • 9