-1

I have a dataset that looks like this

data=
    
**ID  HbA1cRes Year**
    1    65      2003
    2    125     2008
    3    40      2010
    4    110     2007
    5    125     2006
    6    136     2011
    7    20      2012 
    8    58      2009
    9    12      2006 
    10   123     2008

The patients with HbA1cRes > 65 are classified as 'High risk' and the ones below that are classified as 'Low Risk'. I am trying to do a time series analysis using the following code (to see the rise and fall of high risk and low-risk cases over time) and Year <- data$REport_YrMonth

library(tidyverse)
    data$risk <- factor( ifelse( data$HbA1cRes  > 65 ,"High risk patients", "Low risk patients") )
    ggplot(data, aes(x=Year)) + 
      geom_line(aes(y=risk)) + 
      labs(title="Analysis of diabetes' patients status over time", 
           y="Returns %")

However, the output returned is as follows:

enter image description here

Any guess what I am doing wrong here?

IronMaiden
  • 552
  • 4
  • 20
  • 3
    Please make this a reproducible example (downvote is mine). – tjebo Mar 12 '21 at 09:44
  • Also, You may not want time series analysis, but survival curve (modelling e.g. with cox regression) – tjebo Mar 12 '21 at 09:45
  • I want to see how many high-risk and low-risk cases were on the rise or fall over time. The sample data provided should be sufficient to make a reproducible example – IronMaiden Mar 12 '21 at 10:01
  • https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example - check in particular "Copying original data" - why using dput – tjebo Mar 12 '21 at 10:10

2 Answers2

1

Count how many "High risk patients" and "Low risk patients" you have every Year and then plot the data.

library(ggplot2)
library(dplyr)

data %>%
  mutate(risk = factor(ifelse(HbA1cRes  > 65 ,
                       "High risk patients", "Low risk patients"))) %>%
  count(Year, risk) %>%
  ggplot(aes(x=Year, y = n, color = risk)) + 
  geom_line() + 
  labs(title="Analysis of diabetes' patients status over time")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

case_when function may be an elegant solution for data classification.

Instead of geom_line, maybe, geom_col or geom_density may provide better options.

df <- tibble(
  id = 1:10,
  hb = c(65,125,40,110,125,136,20,58,12,123),
  year = c(2003,2008,2010,2007,2006,2011,2012,2009,2006,2008)
)

df <- df %>% 
  mutate(
    risk = case_when(
      hb > 65 ~"high risk",
      TRUE ~"low risk"
    )
  ) %>% 
  count(
    year,
    risk
  )

df %>% 
  ggplot(aes(x=year, y = n, group = risk, fill = risk)) + 
  geom_col(position = "dodge") +
  labs(
    title="Analysis of diabetes' patients status over time", 
    y="Returns %",
    fill = "Risk Status")
  

df %>% 
  ggplot(aes(x=year, fill = risk)) + 
  geom_density(position = "fill") + 
  labs(
    title="Analysis of diabetes' patients status over time", 
    y="Returns %",
    fill = "Risk Status")

abreums
  • 166
  • 8