I want to calculate sum(df$columnD==1)
/(number of rows in each subset of columnE
) but so far I cant even extract sum(df$columnD==1)
within each subset of columnE
in the following data frame:
set.seed(10)
A <- seq(from=1, to=100, by=1)
B <- runif(100, -5, 0.2) # actually I have 900,000 rows
C <- runif(100, 0, 1)
D <- rbinom(100, 1, 0.3)
df <- NULL
df$columnA <- A
df$columnB <- B
df$columnC <- C
df$columnD <- D
df <- as.data.frame(df)
df$columnE <- cut(df$columnB, quantile(df$columnB,(0:10)/10), labels=FALSE,
include.lowest=TRUE) # https://www.portfolioprobe.com/2012/12/24/miles-of-iles/
index <- order(df$columnE, decreasing = F)
df <- df[index,]
I have tried the following and none works:
sum(df$columnD==1)[df$columnE==1] # No
df$columnE[df$columnE==1][sum(df$columnD==1)] # Trying to extract only from subset 1
(sum(df$columnD==1)/sum(df$columnE==1)) # Nein
How do I get around this?