I have simulated some data in R to simulate disease prevalence curves.
I would like the data to plot as a smoothed curve, so in ggplot2 I have used the function geom_smooth. The problem is that I would each of the lines to begin at x&y = 0, but geom_smooth has them start at varying points from 0.1 to 0.15.
**Note - I am not querying how to have the axis begin at 0, I would like the lines on the graph beginning around zero, so this is not a duplicate
Is there a way to constrain how the smoothing is done, so that the lines begin at the same point (0,0), or close to it, similar to this plot?
The script I have used to simulate the data and plot it are:
library(ggplot2)
library(tidyverse)
library(colorspace)
# Set the random seed for reproducibility
set.seed(42)
n <- 80 # Number of data points to simulate
age <- seq(0, 80, length.out = n) # Create the "age" variable ranging from 0 to 80
# Calculate "seroprev" as a natural logarithmic sequence that plateaus at 0.7 after the 40th observation
max_seroprev <- 0.7
seroprev <- pmin(max_seroprev, log(age + 1) / log(40 + 1) * max_seroprev)
# Create a data frame to store the simulated data for year 1988
simulated_data.1988 <- data.frame(seroprev, age)
simulated_data.1988$year <- "1988"
#### 1990 ####
max_seroprev <- 0.65
seroprev <- pmin(max_seroprev, log(age + 1) / log(40 + 1) * max_seroprev)
simulated_data.1990 <- data.frame(seroprev, age)
simulated_data.1990$year <- "1990"
#### 2003 ####
max_seroprev <- 0.53
seroprev <- pmin(max_seroprev, log(age + 1) / log(40 + 1) * max_seroprev)
simulated_data.2003 <- data.frame(seroprev, age)
simulated_data.2003$year <- "2003"
#### 2008 ####
max_seroprev <- 0.45
seroprev <- pmin(max_seroprev, log(age + 1) / log(40 + 1) * max_seroprev)
simulated_data.2008 <- data.frame(seroprev, age)
simulated_data.2008$year <- "2008"
#### 2011 ####
# Initialize "seroprev" with zeros
seroprev <- rep(0, n)
# Calculate "seroprev" as a natural logarithmic sequence starting from age 5
start_age <- 5
seroprev[start_age:n] <- log(1:(n - start_age) + 1) / log((n - start_age) + 1) * 0.4
# Round the "seroprev" values to two decimal places
seroprev <- round(seroprev, 2)
simulated_data.2011 <- data.frame(seroprev, age)
simulated_data.2011$year <- "2011"
simulated_data.2011 <- simulated_data.2011[-c(80),] #fix error in 2011 data
simulated_data.2011 <- simulated_data.2011 %>%
add_row(age = 80, seroprev=0.4, year = "2011")
sim.full <- rbind(simulated_data.1988, simulated_data.1990,
simulated_data.2003, simulated_data.2008,
simulated_data.2011) #bind each simulated year
sim.full$year <- as.factor(sim.full$year)
ggplot(sim.full, aes(x = age, y = seroprev, colour = year, group = year)) +
geom_smooth(se = F) +
xlab("Age (years)") +
ylab("Seroprevalence") +
scale_color_manual(values = c("1988" = "#990000",
"1990" = "red",
"2003" = "orange",
"2008" = "#FFCC00",
"2011" = "yellow")) +
ylim(0, 1) +
theme_classic() +
theme(legend.title = element_blank(),
legend.position=c(0.1,0.9), # Position legend top left
legend.text = element_text( size = 15),
axis.title.x = element_text(face = "bold", size = 15),
axis.title.y = element_text(face = "bold", size = 15),
axis.text = element_text(face = "bold", size = 12))
'''